Skip to content

Commit

Permalink
Generate internal hasbits for singular proto3 implicit presence fields.
Browse files Browse the repository at this point in the history
N.B.:

- This change is not intended to affect any well-defined protobuf behaviour in
  an observable way.
- The wire parsing codepath is not affected.
- This change only affects the C++ protobuf implementation (other languages are
  not affected).
- sizeof proto3 message objects may increase in 32-bit increments to
  accommodate hasbits.
- When profiled on some of Google's largest binaries, we have seen a code size
  increase of ~0.1%, which we consider to be a reasonable increase.

There are quite a few terminologies in the title:

- **singular**: a field that is not repeated, not oneof, not extension, not lazy,
	  just a field with a simple primitive type (number or boolean), or
	  string/bytes.
- **proto3**: describes behaviour consistent to the "proto3" syntax.
          This is equivalent to `edition = "2023"` with
	  `option features.field_presence = IMPLICIT;`.
- **implicit presence**: describes behaviour consistent with "non-optional"
			 fields in proto3. This is described in more detail in
			 https://protobuf.dev/programming-guides/field_presence/#presence-in-proto3-apis

This change enables C++ proto3 objects to generate hasbits for regular proto3
(i.e. non-`optional`) fields. This code change might make certain codepaths
negligibly more efficient, but large improvement or regression is unlikely. A
larger performance improvement is expected from generating hasbits for repeated
fields -- this change will pave the way for future work there.

Hasbits in C++ will have slightly different semantics for implicit presence
fields. In the past, all hasbits are true field presence indicators. If the
hasbit is set, the field is guaranteed to be present; if the hasbit is unset,
the field is guaranteed to be missing.

This change introduces a new hasbit mode that I will call "hint hasbits",
denoted by a newly-introduced enum, `internal::cpp::HasbitMode::kHintHasbit`.
For implicit presence fields, it may be possible to mutate the field and have
it end up as a zero field, especially with `mutable_foo` APIs. To handle those
cases correctly, we unconditionally set the hasbit when `mutable_foo` is
called, then we must do an additional check for field emptiness before
serializing the field onto the wire.

PiperOrigin-RevId: 691945237
  • Loading branch information
tonyliaoss authored and copybara-github committed Oct 31, 2024
1 parent 56580bd commit 3e82ed4
Show file tree
Hide file tree
Showing 9 changed files with 603 additions and 96 deletions.
23 changes: 21 additions & 2 deletions src/google/protobuf/compiler/cpp/field_generators/string_field.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,23 @@
#include "absl/log/absl_check.h"
#include "absl/memory/memory.h"
#include "absl/strings/str_cat.h"
#include "absl/strings/string_view.h"
#include "google/protobuf/compiler/cpp/field.h"
#include "google/protobuf/compiler/cpp/field_generators/generators.h"
#include "google/protobuf/compiler/cpp/helpers.h"
#include "google/protobuf/compiler/cpp/options.h"
#include "google/protobuf/descriptor.h"
#include "google/protobuf/descriptor.pb.h"
#include "google/protobuf/io/printer.h"
#include "google/protobuf/port.h"

namespace google {
namespace protobuf {
namespace compiler {
namespace cpp {
namespace {
using ::google::protobuf::internal::cpp::GetFieldHasbitMode;
using ::google::protobuf::internal::cpp::HasbitMode;
using ::google::protobuf::internal::cpp::HasHasbit;
using ::google::protobuf::io::AnnotationCollector;
using Sub = ::google::protobuf::io::Printer::Sub;
Expand Down Expand Up @@ -533,6 +537,22 @@ void SingularString::GenerateClearingCode(io::Printer* p) const {
)cc");
}

// Returns "ClearNonDefaultToEmpty" or "ClearToEmpty" depending on whether the
// field might still point to the default string instance.
absl::string_view GetClearFunctionForField(const FieldDescriptor* field) {
switch (GetFieldHasbitMode(field)) {
case HasbitMode::kNoHasbit:
case HasbitMode::kHintHasbit:
// TODO: b/376149315 - Would be nice to call ClearNonDefaultToEmpty for
// hint hasbits too.
return "ClearToEmpty";
case HasbitMode::kTrueHasbit:
return "ClearNonDefaultToEmpty";
default:
internal::Unreachable();
}
}

void SingularString::GenerateMessageClearingCode(io::Printer* p) const {
if (is_oneof()) {
p->Emit(R"cc(
Expand Down Expand Up @@ -573,8 +593,7 @@ void SingularString::GenerateMessageClearingCode(io::Printer* p) const {
return;
}

p->Emit({{"Clear",
HasHasbit(field_) ? "ClearNonDefaultToEmpty" : "ClearToEmpty"}},
p->Emit({{"Clear", GetClearFunctionForField(field_)}},
R"cc(
$field_$.$Clear$();
)cc");
Expand Down
80 changes: 61 additions & 19 deletions src/google/protobuf/compiler/cpp/message.cc
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ namespace cpp {
namespace {
using ::google::protobuf::internal::WireFormat;
using ::google::protobuf::internal::WireFormatLite;
using ::google::protobuf::internal::cpp::GetFieldHasbitMode;
using ::google::protobuf::internal::cpp::HasbitMode;
using ::google::protobuf::internal::cpp::HasHasbit;
using Semantic = ::google::protobuf::io::AnnotationCollector::Semantic;
using Sub = ::google::protobuf::io::Printer::Sub;
Expand Down Expand Up @@ -187,7 +189,7 @@ RunMap FindRuns(const std::vector<const FieldDescriptor*>& fields,

void EmitNonDefaultCheck(io::Printer* p, const std::string& prefix,
const FieldDescriptor* field) {
ABSL_CHECK(!HasHasbit(field));
ABSL_CHECK(GetFieldHasbitMode(field) != HasbitMode::kTrueHasbit);
ABSL_CHECK(!field->is_repeated());
ABSL_CHECK(!field->containing_oneof() || field->real_containing_oneof());

Expand Down Expand Up @@ -216,19 +218,27 @@ void EmitNonDefaultCheck(io::Printer* p, const std::string& prefix,
}

bool ShouldEmitNonDefaultCheck(const FieldDescriptor* field) {
return (!field->is_repeated() && !field->containing_oneof()) ||
field->real_containing_oneof();
if (GetFieldHasbitMode(field) == HasbitMode::kTrueHasbit) {
return false;
}
return !field->is_repeated();
}

// Emits an if-statement with a condition that evaluates to true if |field| is
// considered non-default (will be sent over the wire), for message types
// without true field presence. Should only be called if
// !HasHasbit(field).
// If |with_enclosing_braces_always| is set to true, will generate enclosing
// braces even if nondefault check is not emitted -- i.e. code may look like:
// {
// // code...
// }
// If |with_enclosing_braces_always| is set to false, enclosing braces will not
// be generated if nondefault check is not emitted.
void MayEmitIfNonDefaultCheck(io::Printer* p, const std::string& prefix,
const FieldDescriptor* field,
absl::AnyInvocable<void()> emit_body) {
ABSL_CHECK(!HasHasbit(field));

absl::AnyInvocable<void()> emit_body,
bool with_enclosing_braces_always) {
if (ShouldEmitNonDefaultCheck(field)) {
p->Emit(
{
Expand All @@ -240,7 +250,10 @@ void MayEmitIfNonDefaultCheck(io::Printer* p, const std::string& prefix,
$emit_body$;
}
)cc");
} else {
return;
}

if (with_enclosing_braces_always) {
// In repeated fields, the same variable name may be emitted multiple
// times, hence the need for emitting braces even when the if condition is
// not necessary, so that the code looks like:
Expand All @@ -259,7 +272,11 @@ void MayEmitIfNonDefaultCheck(io::Printer* p, const std::string& prefix,
$emit_body$;
}
)cc");
return;
}

// If no enclosing braces need to be emitted, just emit the body directly.
emit_body();
}

bool HasInternalHasMethod(const FieldDescriptor* field) {
Expand Down Expand Up @@ -1034,7 +1051,7 @@ void MessageGenerator::GenerateSingularFieldHasBits(
)cc");
return;
}
if (HasHasbit(field)) {
if (GetFieldHasbitMode(field) == HasbitMode::kTrueHasbit) {
auto v = p->WithVars(HasBitVars(field));
p->Emit(
{Sub{"ASSUME",
Expand Down Expand Up @@ -1244,7 +1261,8 @@ void MessageGenerator::EmitCheckAndUpdateByteSizeForField(
};

if (!HasHasbit(field)) {
MayEmitIfNonDefaultCheck(p, "this_.", field, std::move(emit_body));
MayEmitIfNonDefaultCheck(p, "this_.", field, std::move(emit_body),
/*with_enclosing_braces_always=*/true);
return;
}
if (field->options().weak()) {
Expand All @@ -1260,10 +1278,17 @@ void MessageGenerator::EmitCheckAndUpdateByteSizeForField(
int has_bit_index = has_bit_indices_[field->index()];
p->Emit({{"mask",
absl::StrFormat("0x%08xu", uint32_t{1} << (has_bit_index % 32))},
{"emit_body", [&] { emit_body(); }}},
{"check_nondefault_and_emit_body",
[&] {
// Note that it's possible that the field has explicit presence.
// In that case, nondefault check will not be emitted but
// emit_body will still be emitted.
MayEmitIfNonDefaultCheck(p, "this_.", field, std::move(emit_body),
/*with_enclosing_braces_always=*/false);
}}},
R"cc(
if (cached_has_bits & $mask$) {
$emit_body$;
$check_nondefault_and_emit_body$;
}
)cc");
}
Expand Down Expand Up @@ -4227,9 +4252,10 @@ void MessageGenerator::GenerateClassSpecificMergeImpl(io::Printer* p) {
} else if (field->is_optional() && !HasHasbit(field)) {
// Merge semantics without true field presence: primitive fields are
// merged only if non-zero (numeric) or non-empty (string).
MayEmitIfNonDefaultCheck(p, "from.", field, /*emit_body=*/[&]() {
generator.GenerateMergingCode(p);
});
MayEmitIfNonDefaultCheck(
p, "from.", field,
/*emit_body=*/[&]() { generator.GenerateMergingCode(p); },
/*with_enclosing_braces_always=*/true);
} else if (field->options().weak() ||
cached_has_word_index != HasWordIndex(field)) {
// Check hasbit, not using cached bits.
Expand All @@ -4250,10 +4276,20 @@ void MessageGenerator::GenerateClassSpecificMergeImpl(io::Printer* p) {
format("if (cached_has_bits & 0x$1$u) {\n", mask);
format.Indent();

if (check_has_byte && IsPOD(field)) {
generator.GenerateCopyConstructorCode(p);
if (GetFieldHasbitMode(field) == HasbitMode::kHintHasbit) {
// Merge semantics without true field presence: primitive fields are
// merged only if non-zero (numeric) or non-empty (string).
MayEmitIfNonDefaultCheck(
p, "from.", field,
/*emit_body=*/[&]() { generator.GenerateMergingCode(p); },
/*with_enclosing_braces_always=*/false);
} else {
generator.GenerateMergingCode(p);
ABSL_DCHECK(GetFieldHasbitMode(field) == HasbitMode::kTrueHasbit);
if (check_has_byte && IsPOD(field)) {
generator.GenerateCopyConstructorCode(p);
} else {
generator.GenerateMergingCode(p);
}
}

format.Outdent();
Expand Down Expand Up @@ -4476,7 +4512,12 @@ void MessageGenerator::GenerateSerializeOneField(io::Printer* p,
if (HasHasbit(field)) {
p->Emit(
{
{"body", emit_body},
{"body",
[&]() {
MayEmitIfNonDefaultCheck(p, "this_.", field,
std::move(emit_body),
/*with_enclosing_braces_always=*/false);
}},
{"cond",
[&] {
int has_bit_index = HasBitIndex(field);
Expand All @@ -4496,7 +4537,8 @@ void MessageGenerator::GenerateSerializeOneField(io::Printer* p,
}
)cc");
} else if (field->is_optional()) {
MayEmitIfNonDefaultCheck(p, "this_.", field, std::move(emit_body));
MayEmitIfNonDefaultCheck(p, "this_.", field, std::move(emit_body),
/*with_enclosing_braces_always=*/true);
} else {
emit_body();
}
Expand Down
22 changes: 20 additions & 2 deletions src/google/protobuf/descriptor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -9798,9 +9798,27 @@ bool HasPreservingUnknownEnumSemantics(const FieldDescriptor* field) {
return field->enum_type() != nullptr && !field->enum_type()->is_closed();
}

HasbitMode GetFieldHasbitMode(const FieldDescriptor* field) {
// Do not generate hasbits for "real-oneof" and weak fields.
if (field->real_containing_oneof() || field->options().weak()) {
return HasbitMode::kNoHasbit;
}

// Explicit-presence fields always have true hasbits.
if (field->has_presence()) {
return HasbitMode::kTrueHasbit;
}

// Implicit presence fields.
if (!field->is_repeated()) {
return HasbitMode::kHintHasbit;
}
// We currently don't implement hasbits for implicit repeated fields.
return HasbitMode::kNoHasbit;
}

bool HasHasbit(const FieldDescriptor* field) {
return field->has_presence() && !field->real_containing_oneof() &&
!field->options().weak();
return GetFieldHasbitMode(field) != HasbitMode::kNoHasbit;
}

static bool IsVerifyUtf8(const FieldDescriptor* field, bool is_lite) {
Expand Down
29 changes: 27 additions & 2 deletions src/google/protobuf/descriptor.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
#include "google/protobuf/stubs/common.h"
#include "absl/base/attributes.h"
#include "absl/base/call_once.h"
#include "absl/base/optimization.h"
#include "absl/container/btree_map.h"
#include "absl/container/flat_hash_map.h"
#include "absl/functional/any_invocable.h"
Expand Down Expand Up @@ -2949,9 +2950,33 @@ constexpr int MaxMessageDeclarationNestingDepth() { return 32; }
PROTOBUF_EXPORT bool HasPreservingUnknownEnumSemantics(
const FieldDescriptor* field);

PROTOBUF_EXPORT bool HasHasbit(const FieldDescriptor* field);

#ifndef SWIG
enum class HasbitMode : uint8_t {
// Hasbits do not exist for the field.
kNoHasbit,
// Hasbits exist and indicate field presence.
// Hasbit is set if and only if field is present.
kTrueHasbit,
// Hasbits exist and "hint at" field presence.
// When hasbit is set, field is 'probably' present, but field accessors must
// still check for field presence (i.e. false positives are possible).
// When hasbit is unset, field is guaranteed to be not present.
kHintHasbit,
};

// Returns the "hasbit mode" of the field. Depending on the implementation, a
// field can:
// - have no hasbits in its internal object (kNoHasbit);
// - have hasbits where hasbit == 1 indicates field presence and hasbit == 0
// indicates an unset field (kTrueHasbit);
// - have hasbits where hasbit == 1 indicates "field is possibly modified" and
// hasbit == 0 indicates "field is definitely missing" (kHintHasbit).
PROTOBUF_EXPORT HasbitMode GetFieldHasbitMode(const FieldDescriptor* field);

// Returns true if there are hasbits for the field.
// Note that this does not correlate with "hazzer"s, i.e., whether has_foo APIs
// are emitted.
PROTOBUF_EXPORT bool HasHasbit(const FieldDescriptor* field);

enum class Utf8CheckMode : uint8_t {
kStrict = 0, // Parsing will fail if non UTF-8 data is in string fields.
Expand Down
Loading

0 comments on commit 3e82ed4

Please sign in to comment.