Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: only update revisions if currentRev is updated #414

Merged
merged 1 commit into from
Feb 19, 2025

Conversation

svanharmelen
Copy link
Contributor

We encountered issues with our setup after updating Kine to version v0.13.7. The issues we encountered where similar to the ones described in #357 and were supposed to be fixed in #360 and #365.

While I noticed that our custom SQLite DSN was missing a few options that seem to help remedy the issues, I still had the impression something else was not playing nice as well.

So after building a local test setup and using the jobloader example to create a small load generator that matched our usage profile (guess in hindsight that wasn't even necessary) I was able to reproduce the issue.

We noticed in the test logs that at a certain moment in time the compaction process stopped all together, just as it does in our production setup. Before it stopped it already failed a few times (each time with the database is locked error), but each time (except for the last time) it resumed/tried again after 5 minutes.

What caught my eye when comparing logs with the test setup and our production logs is that it only stopped completely when there wasn't a single successful compaction done before the database is locked error was received. When looking at the code with that in mind it was easy to see the minor mistake in the code that was responsible for this behavior.

As currentRev is declared in the loop, it's reset to 0 on each iteration. So if it's not updated at least once targetCompactRev will be set to 0 at the end of the loop and will prevent any compaction to be done from that point on.

@svanharmelen svanharmelen requested a review from a team as a code owner February 16, 2025 17:53
@svanharmelen svanharmelen force-pushed the fix/compaction branch 2 times, most recently from bc670a1 to da47fe1 Compare February 17, 2025 00:06
@svanharmelen svanharmelen changed the title fix: set currentRev to compactRev before starting compaction fix: only update revisions if at least one iteration succeeded Feb 17, 2025
@svanharmelen svanharmelen force-pushed the fix/compaction branch 2 times, most recently from d540a7c to 12f4da0 Compare February 17, 2025 00:19
We encountered issues with our setup after updating Kine to version
v0.13.7. The issues we encountered where similar to the ones described
in k3s-io#357 and were supposed to be fixed in k3s-io#360 and k3s-io#365.

While I noticed that our custom SQLite DSN was missing a few options
that seem to help remedy the issues, I still had the impression
something else was not playing nice as well.

So after building a local test setup and using the `jobloader` example
to create a small load generator that matched our usage profile (guess
in hindsight that wasn't even necessary) I was able to reproduce the
issue.

We noticed in the test logs that at a certain moment in time the
compaction process stopped all together, just as it does in our
production setup. Before it stopped it already failed a few times (each
time with the `database is locked` error), but each time (except for the
last time) it resumed/tried again after 5 minutes.

What caught my eye when comparing logs with the test setup and our
production logs is that it only stopped completely when there wasn't a
single successful compaction done before the `database is locked` error
was received. When looking at the code with that in mind it was easy to
see the minor mistake in the code that was responsible for this
behavior.

As `currentRev` is declared in the loop, it's reset to 0 on each
iteration. So if it's not updated at least once `targetCompactRev` will
be set to 0 at the end of the loop and will prevent any compaction to be
done from that point on.

Signed-off-by: Sander van Harmelen <[email protected]>
@svanharmelen svanharmelen changed the title fix: only update revisions if at least one iteration succeeded fix: only update revisions if currentRev is updated Feb 17, 2025
@brandond
Copy link
Member

brandond commented Feb 18, 2025

Thanks for the PR. Can you comment more on what sqlite options you were (or were not) using in your DSN that caused this to occur? Were you able to find any way to reproduce this other than having database locking issues during the very first compaction attempt?

@svanharmelen
Copy link
Contributor Author

svanharmelen commented Feb 18, 2025

We were using sqlite:///path/to/file?mode=rwc&_journal=WAL&cache=shared, so we didn't have the busy_timeout and txlock options.

After reading a few recent issues and PR's we now updated it to sqlite:///path/to/file?_journal=WAL&cache=shared&_busy_timeout=30000&_txlock=immediate

We didn't try to reproduce it using any other errors than the database is locked error, but considering the current code the same bug could/should be triggered by any (SQL) error happening during the first compaction attempt.

At least that is how I read and understand the code 😊

@brandond brandond merged commit eeae736 into k3s-io:master Feb 19, 2025
3 checks passed
@svanharmelen svanharmelen deleted the fix/compaction branch February 19, 2025 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants