Description
Original issue 120 created by mvonmaszewski on 2012-09-24T14:42:07.000Z:
What steps will reproduce the problem?
- Using Linux.
- Within a process, open a database, open it again
- Second open succeeds. Expected failure.
This failure to block second open within same process hurts where the actually caller is a garbage collection based language such as Erlang or Java. Assume the database is opened as part of an object and closed as part of the object's destructor. The destructor is only called during garbage collection. This creates the potential for the close to be called after a subsequent reopen of the same database. And that just does really bad things to the Manifest file ... leading to death and destruction down the road.
Here is a known fix with comments:
// matthewv July 17, 2012 ... riak was overlapping activity on the
// same database directory due to the incorrect assumption that the
// code below worked within the riak process space. The fix leads to a choice:
// fcntl() only locks against external processes, not multiple locks from
// same process. But it has worked great with NFS forever
// flock() locks against both external processes and multiple locks from
// same process. It does not with NFS until Linux 2.6.12 ... other OS may vary.
// SmartOS/Solaris do not appear to support flock() though there is a man page.
// Pick the fcntl() or flock() below as appropriate for your environment / needs.
static int LockOrUnlock(int fd, bool lock) {
ifndef LOCK_UN
// works with NFS, but fails if same process attempts second access to
// db, i.e. makes second DB object to same directory
errno = 0;
struct flock f;
memset(&f, 0, sizeof(f));
f.l_type = (lock ? F_WRLCK : F_UNLCK);
f.l_whence = SEEK_SET;
f.l_start = 0;
f.l_len = 0; // Lock/unlock entire file
return fcntl(fd, F_SETLK, &f);
else
// does NOT work with NFS, but DOES work within same process
return flock(fd, (lock ? LOCK_EX : LOCK_UN) | LOCK_NB);
endif
}
Activity