Kine skips compact intervals if compaction fails to complete

If the compaction loop fails for any reason, the rows will be compacted and the compact-rev key will be updated, but the expected compact-rev key stored in memory won't be updated - so it thinks that some other node has compacted, and skips the following interval.

This has been reported several times:
* https://github.com/k3s-io/k3s/discussions/10626#discussioncomment-10231082
* https://github.com/k3s-io/k3s/discussions/11251#discussioncomment-11183233

The first instance was in an odd multi-master Galera cluster, but the second was on plain old sqlite.

This is because if any compaction fails, we restart the outer loop:
https://github.com/k3s-io/kine/blob/c1b2bd81f697c6b7aec85ea2562bcbcdfb981307/pkg/logstructured/sqllog/sql.go#L155-L157
without recording any of the work done by prior successful iterations of the inner loop:
https://github.com/k3s-io/kine/blob/c1b2bd81f697c6b7aec85ea2562bcbcdfb981307/pkg/logstructured/sqllog/sql.go#L165-L167

We should fix that, but we should also figure out how to better handle locking errors when trying to compact.

For sqlite at least, this may be related to go-sqlite3's BeginTX ignoring TxOptions:
* https://github.com/mattn/go-sqlite3/issues/685

This is BAD, as the default behavior of sqlite transactions is to... not actually start a transaction:
https://sqlite.org/forum/info/c3cb9524bef62b67#forum11484
> A bare BEGIN (as in BEGIN DEFERRED) does not start a transaction. It turns off the auto-commit machinery so that the transaction commenced by the next statement is not automatically committed at the end of the execution of that statement. If that statement is a "read" statement, then the transaction is a read transaction. If that statement is a "write" statement, then the transaction is a write transaction. BEGIN IMMEDIATE and BEGIN EXCLUSIVE both turn off the auto-commit machinery and start a transaction (write or exclusive respectively)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kine skips compact intervals if compaction fails to complete #357

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	logrus.Errorf("Compact failed: %v", err)
	metrics.CompactTotal.WithLabelValues(metrics.ResultError).Inc()
	continue outer

	// Record the final results for the outer loop
	compactRev = compactedRev
	targetCompactRev = currentRev