Skip to content

Commit

Permalink
Merge branch 'main' into dependabot/github_actions/benchmark-action/g…
Browse files Browse the repository at this point in the history
…ithub-action-benchmark-1.15.0
  • Loading branch information
timbray authored Nov 9, 2022
2 parents c7632c9 + fb23177 commit 84e1028
Show file tree
Hide file tree
Showing 8 changed files with 351 additions and 145 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/codeql-analysis.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@18fe527fa8b29f134bb91f32f1a5dc5abb15ed7f
uses: github/codeql-action/init@c3b6fce4ee2ca25bc1066aa3bf73962fda0e8898
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
Expand All @@ -54,7 +54,7 @@ jobs:
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@18fe527fa8b29f134bb91f32f1a5dc5abb15ed7f
uses: github/codeql-action/autobuild@c3b6fce4ee2ca25bc1066aa3bf73962fda0e8898

# ℹ️ Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
Expand All @@ -68,4 +68,4 @@ jobs:
# make release

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@18fe527fa8b29f134bb91f32f1a5dc5abb15ed7f
uses: github/codeql-action/analyze@c3b6fce4ee2ca25bc1066aa3bf73962fda0e8898
34 changes: 34 additions & 0 deletions PATTERNS.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,40 @@ are two Exists Patterns that would match the Events above:
If a Field in a Pattern contains an Exists Pattern, it
**MUST NOT** contain any other values.

Exists Patterns currently only work on leaf nodes. That is to
say, given this event:

```json
{ "a": { "b": 1 } }
```

The following pattern will not match:

```json
{ "a": [ {"exists": true} ] }
```

We may be able to change this in future.

The case of empty arrays is interesting. Consider this event:

```json
{ "a": [] }
```

Then `"exists": true` does not match but `"exists": false` does.
I.e., only the first of the two sample patterns below matches.

```json
{ "a": [ { "exists": false } ] }
```
```json
{ "a": [ { "exists": true } ] }
```
This makes sense in the context of the leaf-node semantics; there
really is no value for the `"a"` field.


### Anything-But Pattern

The Pattern Type of an Anything-But Pattern is
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,10 +131,14 @@ one `*` character. The architecture probably allows
support for a larger subset of regular expressions,
eventually.

The `"exists":true` and `"exists":false` patterns
have corner cases; details are covered in
[Patterns in Quamina](PATTERNS.md).

Number matching is weak - the number has to appear
exactly the same in the Pattern and the Event. I.e.,
Quamina doesn't know that 35, 35.000, and 3.5e1 are the
same number. There's a fix for this in the code which
same number. There's a fix for this in the code which
is not yet activated because it causes a
significant performance penalty, so the API needs to
be enhanced to only ask for it when you need it.
Expand Down
207 changes: 123 additions & 84 deletions core_matcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,35 +18,45 @@ import (
"sync/atomic"
)

// coreMatcher uses a finite automaton to implement the matchesForJSONEvent and MatchesForFields functions.
// state is the start of the automaton
// namesUsed is a map of field names that are used in any of the patterns that this automaton encodes. Typically,
// patterns only consider a subset of the fields in an incoming data object, and there is no reason to consider
// fields that do not appear in patterns when using the automaton for matching
// the updateable fields are grouped into the coreStart member so they can be updated atomically using atomic.Load()
// and atomic.Store(). This is necessary for coreMatcher to be thread-safe.
// coreMatcher uses an automaton to implement addPattern and matchesForFields.
// There are two levels of concurrency here. First, the lock field in this struct must be held by any goroutine
// that is executing addPattern(), i.e. only one thread may be updating the state machine at one time.
// However, any number of goroutines may in parallel be executing matchesForFields while the addPattern
// update is in progress. The updateable atomic.Value allows the addPattern thread to change the maps and
// slices in the structure atomically with atomic.Load() while matchesForFields threads are reading them.
type coreMatcher struct {
updateable atomic.Value // always holds a *coreStart
updateable atomic.Value // always holds a *coreFields
lock sync.Mutex
}
type coreStart struct {
state *fieldMatcher
namesUsed map[string]bool
presumedExistFalseMatches *matchSet

// coreFields groups the updateable fields in coreMatcher.
// state is the start of the automaton.
// namesUsed is a map of field names that are used in any of the patterns that this automaton encodes. Typically,
// patterns only consider a subset of the fields in an incoming data object, and there is no reason to consider
// fields that do not appear in patterns when using the automaton for matching.
// fakeField is used when the flattener for an event returns no fields, because it could still match if
// there were patterns with "exists":false. So in this case we run one fake field through the matcher
// which will cause it to notice that any "exists":false patterns should match.
type coreFields struct {
state *fieldMatcher
namesUsed map[string]bool
}

func newCoreMatcher() *coreMatcher {
// because of the way the matcher works, to serve its purpose of ensuring that "exists":false maches
// will be detected, the Path has to be lexically greater than any field path that appears in
// "exists":false. The value with byteCeiling works because that byte can't actually appear in any
// user-supplied path-name because it's not valid in UTF-8
m := coreMatcher{}
m.updateable.Store(&coreStart{
state: newFieldMatcher(),
namesUsed: make(map[string]bool),
presumedExistFalseMatches: newMatchSet(),
m.updateable.Store(&coreFields{
state: newFieldMatcher(),
namesUsed: make(map[string]bool),
})
return &m
}

func (m *coreMatcher) start() *coreStart {
return m.updateable.Load().(*coreStart)
func (m *coreMatcher) start() *coreFields {
return m.updateable.Load().(*coreFields)
}

// addPattern - the patternBytes is a JSON object. The X is what the matcher returns to indicate that the
Expand All @@ -60,14 +70,11 @@ func (m *coreMatcher) addPattern(x X, patternJSON string) error {
sort.Slice(patternFields, func(i, j int) bool { return patternFields[i].path < patternFields[j].path })

// only one thread can be updating at a time
// NOTE: threads can be calling MatchesFor* functions at any time as we update the automaton. The goal is to
// maintain consistency during updates, in the sense that a pattern that has been matching events will not
// stop working during an update.
m.lock.Lock()
defer m.lock.Unlock()

// we build up the new coreMatcher state in freshStart so we can atomically switch it in once complete
freshStart := &coreStart{}
freshStart := &coreFields{}
freshStart.namesUsed = make(map[string]bool)
current := m.start()
freshStart.state = current.state
Expand All @@ -78,25 +85,27 @@ func (m *coreMatcher) addPattern(x X, patternJSON string) error {
for used := range patternNamesUsed {
freshStart.namesUsed[used] = true
}
freshStart.presumedExistFalseMatches = newMatchSet()
for presumedExistsFalseMatch := range current.presumedExistFalseMatches.set {
freshStart.presumedExistFalseMatches = freshStart.presumedExistFalseMatches.addX(presumedExistsFalseMatch)
}

// now we add each of the name/value pairs in fields slice to the automaton, starting with the start state -
// the addTransition for a field returns a list of the fieldMatchers transitioned to for that name/val
// combo.
states := []*fieldMatcher{current.state}
for _, field := range patternFields {
var nextStates []*fieldMatcher
for _, state := range states {
ns := state.addTransition(field)

// special handling for exists:false, in which case there can be only one val and one next state
if field.vals[0].vType == existsFalseType {
ns[0].addExistsFalseFailure(x)
freshStart.presumedExistFalseMatches = freshStart.presumedExistFalseMatches.addX(x)
// separate handling for field exists:true/false and regular field name/val matches. Since the exists
// true/false are only allowed one value, we can test vals[0] to figure out which type
for _, state := range states {
var ns []*fieldMatcher
switch field.vals[0].vType {
case existsTrueType:
ns = state.addExists(true, field)
case existsFalseType:
ns = state.addExists(false, field)
default:
ns = state.addTransition(field)
}

nextStates = append(nextStates, ns...)
}
states = nextStates
Expand All @@ -106,9 +115,7 @@ func (m *coreMatcher) addPattern(x X, patternJSON string) error {
// by matching each field in the pattern so update the matches value to indicate this (skipping those that
// are only there to serve exists:false processing)
for _, endState := range states {
if !endState.fields().existsFalseFailures.contains(x) {
endState.addMatch(x)
}
endState.addMatch(x)
}
m.updateable.Store(freshStart)

Expand All @@ -129,76 +136,108 @@ func (m *coreMatcher) matchesForJSONEvent(event []byte) ([]X, error) {
if err != nil {
return nil, err
}

// see the commentary on coreMatcher for an explanation of this.
// tl;dr: If the flattener returns no fields because there's nothing in the event that's mentioned in
// any patterns, the event could still match if there are only "exists":false patterns.
if len(fields) == 0 {
fields = []Field{
{
Path: []byte{byte(byteCeiling)},
Val: []byte(""),
ArrayTrail: []ArrayPos{{0, 0}},
},
}
}

return m.matchesForFields(fields)
}

// matchesForFields takes a list of Field structures and sorts them by pathname; the fields in a pattern to
// matched are similarly sorted; thus running an automaton over them works
// matchesForFields takes a list of Field structures, sorts them by pathname, and launches the field-matching
// process. The fields in a pattern to match are similarly sorted; thus running an automaton over them works
func (m *coreMatcher) matchesForFields(fields []Field) ([]X, error) {
sort.Slice(fields, func(i, j int) bool { return string(fields[i].Path) < string(fields[j].Path) })
return m.matchesForSortedFields(fields).matches(), nil
}
matches := newMatchSet()

// proposedTransition represents a suggestion that the name/value pair at fields[fieldIndex] might allow a transition
// in the indicated state
type proposedTransition struct {
matcher *fieldMatcher
fieldIndex int
// for each of the fields, we'll try to match the automaton start state to that field - the tryToMatch
// routine will, in the case that there's a match, call itself to see if subsequent fields after the
// first matched will transition through the machine and eventually achieve a match
s := m.start()
for i := 0; i < len(fields); i++ {
tryToMatch(fields, i, s.state, matches)
}
return matches.matches(), nil
}

// matchesForSortedFields runs the provided list of name/value pairs against the automaton and returns
// a possibly-empty list of the patterns that match
func (m *coreMatcher) matchesForSortedFields(fields []Field) *matchSet {
failedExistsFalseMatches := newMatchSet()
matches := newMatchSet()
// tryToMatch tries to match the field at fields[index] to the provided state. If it does match and generate
// 1 or more transitions to other states, it calls itself recursively to see if any of the remaining fields
// can continue the process by matching that state.
func tryToMatch(fields []Field, index int, state *fieldMatcher, matches *matchSet) {
stateFields := state.fields()

// The idea is that we add potential field transitions to the proposals list; any time such a transition
// succeeds, i.e. matches a particular field and moves to a new state, we propose transitions from that
// state on all the following fields in the event
// Start by giving each field a chance to match against the start state. Doing it by pre-allocating the
// proposals and filling in their values is observably faster than the more idiomatic append()
proposals := make([]proposedTransition, len(fields))
for i := range fields {
proposals[i].fieldIndex = i
proposals[i].matcher = m.start().state
// transition on exists:true?
existsTrans, ok := stateFields.existsTrue[string(fields[index].Path)]
if ok {
matches = matches.addXSingleThreaded(existsTrans.fields().matches...)
for nextIndex := index + 1; nextIndex < len(fields); nextIndex++ {
if noArrayTrailConflict(fields[index].ArrayTrail, fields[nextIndex].ArrayTrail) {
tryToMatch(fields, nextIndex, existsTrans, matches)
}
}
}

// as long as there are still potential transitions
for len(proposals) > 0 {
// go slices could usefully have a "pop" primitive
lastIndex := len(proposals) - 1
proposal := proposals[lastIndex]
proposals = proposals[0:lastIndex]
// an exists:false transition is possible if there is no matching field in the event
// func checkExistsFalse(stateFields *fmFields, fields []Field, index int, matches *matchSet) {
checkExistsFalse(stateFields, fields, index, matches)

// generate the possibly-empty list of transitions from state on the name/value pair
nextStates := proposal.matcher.transitionOn(&fields[proposal.fieldIndex])
// try to transition through the machine
nextStates := state.transitionOn(&fields[index])

// for each state in the set of transitions from the proposed state
for _, nextState := range nextStates {
// if arriving at this state means we've matched one or more patterns, record that fact
matches = matches.addXSingleThreaded(nextState.fields().matches...)
// for each state in the possibly-empty list of transitions from this state on fields[index]
for _, nextState := range nextStates {
nextStateFields := nextState.fields()
matches = matches.addXSingleThreaded(nextStateFields.matches...)

// have we invalidated a presumed exists:false pattern?
for existsMatch := range nextState.fields().existsFalseFailures.set {
failedExistsFalseMatches = failedExistsFalseMatches.addXSingleThreaded(existsMatch)
// for each state we've transitioned to, give each subsequent field a chance to
// transition on it, assuming it's not in an object that's in a different element
// of the same array
for nextIndex := index + 1; nextIndex < len(fields); nextIndex++ {
if noArrayTrailConflict(fields[index].ArrayTrail, fields[nextIndex].ArrayTrail) {
tryToMatch(fields, nextIndex, nextState, matches)
}
}
// now we've run out of fields to match this nextState against. But suppose it has an exists:false
// transition, and it so happens that the exists:false pattern field is lexically larger than the other
// fields and that in fact such a field does not exist. That state would be left hanging. So…
checkExistsFalse(nextStateFields, fields, index, matches)
}
}

// for each state we've transitioned to, give each subsequent field a chance to
// transition on it, assuming it's not in an object that's in a different element
// of the same array
for nextIndex := proposal.fieldIndex + 1; nextIndex < len(fields); nextIndex++ {
if noArrayTrailConflict(fields[proposal.fieldIndex].ArrayTrail, fields[nextIndex].ArrayTrail) {
proposals = append(proposals, proposedTransition{fieldIndex: nextIndex, matcher: nextState})
func checkExistsFalse(stateFields *fmFields, fields []Field, index int, matches *matchSet) {
for existsFalsePath, existsFalseTrans := range stateFields.existsFalse {
// it seems like there ought to be a more state-machine-idiomatic way to do this but
// I thought of a few and none of them worked. Quite likely someone will figure it out eventually.
// Could get slow for big events with hundreds or more fields (not that I've ever seen that) - might
// be worthwhile switching to binary search at some field count.
var i int
var thisFieldIsAnExistsFalse bool
for i = 0; i < len(fields); i++ {
if string(fields[i].Path) == existsFalsePath {
if i == index {
thisFieldIsAnExistsFalse = true
}
break
}
}
}
for presumedExistsFalseMatch := range m.start().presumedExistFalseMatches.set {
if !failedExistsFalseMatches.contains(presumedExistsFalseMatch) {
matches = matches.addXSingleThreaded(presumedExistsFalseMatch)
if i == len(fields) {
matches = matches.addXSingleThreaded(existsFalseTrans.fields().matches...)
if thisFieldIsAnExistsFalse {
tryToMatch(fields, index+1, existsFalseTrans, matches)
} else {
tryToMatch(fields, index, existsFalseTrans, matches)
}
}
}
return matches
}

func noArrayTrailConflict(from []ArrayPos, to []ArrayPos) bool {
Expand Down
Loading

0 comments on commit 84e1028

Please sign in to comment.