Description
This issue serves as discussion for reintroducing proto2 support into the C# protobuf implementation (Google.Protobuf). Here we can talk about reasons to reimplement proto2, reasons not to, and a possible implementation strategy.
Short History:
The original implementation supported proto2. However upon the release of proto3, the project went to being proto3 only and everyone was told use another implementation if they wanted proto2 (or just move to proto3).
Now features are being planned and added to the library that will make full proto2 support easy to add, such as extensions and unknown field support.
Motivation
Proto2 must be partially implemented for reflection and descriptor access. To access custom options in descriptors you need some sort of extensions API and if you already support proto2 then adding in the necessary code is a cinch. However if you don't have proto2 support you must make another implementation like CustomOptions in Google.Protobuf which may or may not support all the features custom options have to offer. For example, CustomOptions does not support repeated extensions.
With full proto2 support we can have full extension support for descriptors and forced internal access for descriptors can be removed, allowing people to create protoc plugins in C#. See google/protobuf#3487
This also gives people other options in the library they use if they don't want to use a reflection based implementation or make their own.
Implementation
This implementation is prototyped on ObsidianMinor/protobuf/csharp/proto2
Each section is split into a Library section and a Compiler section. The Library section includes changes made to the Google.Protobuf library and the Compiler section includes changes made to the protoc compiler and code-gen.
Field presence (required/optional)
Library
To support field presence and required fields we have to add a new method to the IMessage interface. To keep it simple and similar to other implementations, this will be named IMessage.IsInitialized(). If we don't make this method then users will not have the ability to check if all required fields are present, causing any WriteTo calls to fail at runtime. This will break compatibility with v3.0.0 and will make the addition of proto2 a breaking change.
Compiler
The biggest changes will be in the compiler. In a short bulleted list, the generated-code will have these changes:
- "Has" method in proto2
- "Clear" method in proto2 and proto3 (can be reused in reflection)
- "Has property checks" will use the new "Has" method in proto2, will keep old format in proto3
- Backing field is always nullable in proto2
- IsInitialized method implementation
- WriteTo has IsInitialized check before write (throws ArgumentException if not initialized)
- MergeFrom has IsInitialized check after merge (throws InvalidProtocolBufferException if not initialized)
Default values
Compiler
This won't require many changes to the compiler since it already supports most primitives. Since bytes can have default values and can't be constants, the best way to make default byte strings is to use a static readonly field that is initialized with ByteString.FromBase64. If this default value is made public, it might make sense to create a constant field for all other types that can be made constant (in the prototype this has already been done).
To make sure string constants are kept 100% accurate, the current prototype creates them with UTF-16 Unicode literals. This can be modified if necessary.
This does not modify the existing proto3 code, however the field initializer does use the new constant field in the prototype.
Example generated string default value
/// <summary>Default value for the "default_string" field</summary>
public const string DefaultStringDefaultValue = "\u0068\u0065\u006c\u006c\u006f";
private string defaultString_;
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public string DefaultString {
get { return defaultString_ ?? DefaultStringDefaultValue; }
set {
defaultString_ = pb::ProtoPreconditions.CheckNotNull(value, "value");
}
}
Example generated bytes default value
/// <summary>Default value for the "default_bytes" field</summary>
public readonly static pb::ByteString DefaultBytesDefaultValue = pb::ByteString.FromBase64("d29ybGQ=");
private pb::ByteString defaultBytes_;
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public pb::ByteString DefaultBytes {
get { return defaultBytes_ ?? DefaultBytesDefaultValue; }
set {
defaultBytes_ = pb::ProtoPreconditions.CheckNotNull(value, "value");
}
}
Extensions
Extensions are a big addition to the library and there's a few ways to go about different aspects of the implementation. This section will be split into 3 subsections which will be further broken into the library and compiler sections.
Extension messages
Library
This defines two new interfaces for messages: IExtensionMessage and IExtensionMessage<T>.
public interface IExtensionMessage : IMessage
{
void RegisterExtension(Extension extension);
}
public interface IExtensionMessage<T> : IExtensionMessage, IMessage<T> where T : IExtensionMessage<T>
{
void RegisterExtension<TValue>(Extension<T, TValue> extension);
TValue GetExtension<TValue>(Extension<T, TValue> extension);
RepeatedField<TValue> GetExtension<TValue>(Extension<T, TValue> extension);
void SetExtension<TValue>(Extension<T, TValue> extension, TValue value);
bool HasExtension<TValue>(Extension<T, TValue> extension);
void ClearExtension<TValue>(Extension<T, TValue> extension);
}
IExtensionMessage only has one method specifically for the ExtensionRegistry, allowing it to register extensions for the type. In generated code this method is explicitly implemented. Something to be considered is non-generic get, set, has, and clear methods for IExtensionMessage, however the only benefit to this is easier reflection access.
Compiler
In the compiler, adding this only requires a few changes to MessageGenerator. If the message type has any extension ranges, the compiler should generate relevant extension code. So it should write code for IExtensionMessage, ExtensionSet, etc., even if no extensions are defined that we can see. Since we will be checking to see if we have extension ranges quite often, it would be helpful to introduce a new private bool field that is set during construction that we can check to see if we have ranges. In the prototype this field is named "has_extension_ranges_".
Example
Generated prototype message code for TestAllExtensions in unittest.proto
public sealed partial class TestAllExtensions : pb::IExtensionMessage<TestAllExtensions> {
private static readonly pb::MessageParser<TestAllExtensions> _parser = new pb::MessageParser<TestAllExtensions>(() => new TestAllExtensions());
private pb::UnknownFieldSet _unknownFields;
private pb::ExtensionSet<TestAllExtensions> _extensions = new pb::ExtensionSet<TestAllExtensions>();
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public static pb::MessageParser<TestAllExtensions> Parser { get { return _parser; } }
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public static pbr::MessageDescriptor Descriptor {
get { return global::Google.Protobuf.TestProtos.Proto2.UnittestReflection.Descriptor.MessageTypes[6]; }
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
pbr::MessageDescriptor pb::IMessage.Descriptor {
get { return Descriptor; }
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public TestAllExtensions() {
OnConstruction();
}
partial void OnConstruction();
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public TestAllExtensions(TestAllExtensions other) : this() {
_unknownFields = pb::UnknownFieldSet.Clone(other._unknownFields);
_extensions.MergeFrom(other._extensions);
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public TestAllExtensions Clone() {
return new TestAllExtensions(this);
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public override bool Equals(object other) {
return Equals(other as TestAllExtensions);
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public bool Equals(TestAllExtensions other) {
if (ReferenceEquals(other, null)) {
return false;
}
if (ReferenceEquals(other, this)) {
return true;
}
if (!Equals(_extensions, other._extensions)) {
return false;
}
return Equals(_unknownFields, other._unknownFields);
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public override int GetHashCode() {
int hash = 1;
hash ^= _extensions.GetHashCode();
if (_unknownFields != null) {
hash ^= _unknownFields.GetHashCode();
}
return hash;
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public override string ToString() {
return pb::JsonFormatter.ToDiagnosticString(this);
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public void WriteTo(pb::CodedOutputStream output) {
pb::ProtoPreconditions.CheckInitialized(this);
if (_unknownFields != null) {
_unknownFields.WriteTo(output);
}
_extensions.WriteTo(output);
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public int CalculateSize() {
int size = 0;
if (_unknownFields != null) {
size += _unknownFields.CalculateSize();
}
size += _extensions.CalculateSize();
return size;
}
void pb::IExtensionMessage.RegisterExtension(pb::Extension extension) {
_extensions.Register(extension);
}
public void RegisterExtension<TValue>(pb::Extension<TestAllExtensions, TValue> extension) {
_extensions.Register(extension);
}
public TValue GetExtension<TValue>(pb::Extension<TestAllExtensions, TValue> extension) {
return _extensions.Get(extension);
}
public pbc::RepeatedField<TValue> GetExtension<TValue>(pb::RepeatedExtension<TestAllExtensions, TValue> extension) {
return _extensions.Get(extension);
}
public void SetExtension<TValue>(pb::Extension<TestAllExtensions, TValue> extension, TValue value) {
_extensions.Set(extension, value);
}
public bool HasExtension<TValue>(pb::Extension<TestAllExtensions, TValue> extension) {
return _extensions.Has(extension);
}
public void ClearExtension<TValue>(pb::Extension<TestAllExtensions, TValue> extension) {
_extensions.Clear(extension);
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public void MergeFrom(TestAllExtensions other) {
if (other == null) {
return;
}
_unknownFields = pb::UnknownFieldSet.MergeFrom(_unknownFields, other._unknownFields);
_extensions.MergeFrom(other._extensions);
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public void MergeFrom(pb::CodedInputStream input) {
uint tag;
while ((tag = input.ReadTag()) != 0) {
switch(tag) {
default:
if (!_extensions.TryMergeFieldFrom(input)) {
_unknownFields = pb::UnknownFieldSet.MergeFieldFrom(_unknownFields, input);
}
break;
}
}
pb::ProtoPreconditions.CheckMergedRequiredFields(this);
}
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public bool IsInitialized() {
if (!_extensions.IsInitialized()) {
return false;
}
return true;
}
}
Extension sets
Library
Extension sets are made with two classes. A base abstract class that contains the logic for merging, writing, sizing, and organizing fields and a generic class that inherits the base class, allowing generic access to get, set, clear, and check presence of values.
Extension sets keep track of fields with two dictionaries. One with extension identifiers as keys and another with tags as keys. The dictionary with identifiers as keys uses the default object hash code since that is based on references and users should only be using the generated identifier reference to reference extension fields. The dictionary with tags is used by TryMergeFieldFrom to quickly merge a field based on the last tag. Values are generic and implement the IExtensionValue interface. This interface defines the required methods for non-generic writing and merging methods. Instances of IExtensionValue are created through the extension identifier itself using the provided FieldCodec<T>.
Prototype API
public abstract class ExtensionSet
{
public void Register(Extension extension);
public bool TryMergeFieldFrom(CodedInputStream stream);
public void WriteTo(CodedOutputStream stream);
public int CalculateSize();
public bool IsInitialized();
public override int GetHashCode();
}
public sealed class ExtensionSet<TTarget> : ExtensionSet, IEquatable<ExtensionSet<TTarget>> where TTarget : IExtensionMessage<TTarget>
{
public ExtensionSet();
public void Register<TValue>(Extension<TTarget, TValue> extension);
public TValue Get<TValue>(Extension<TTarget, TValue> extension);
public RepeatedField<TValue> Get<TValue>(RepeatedExtension<TTarget, TValue> extension);
public void Set<TValue>(Extension<TTarget, TValue> extension, TValue value);
public bool Has<TValue>(Extension<TTarget, TValue> extension);
public void Clear<TValue>(Extension<TTarget, TValue> extension);
public void MergeFrom(ExtensionSet<TTarget> other);
public bool Equals(ExtensionSet<TTarget> other);
public override bool Equals(object obj);
public override int GetHashCode();
}
Extension identifiers
Library
The library will introduce three new classes:
- Extension
- Extension<T, V>
- RepeatedExtension<T, V>
Extension is a non-generic identifier that contains enough code for the non-generic portions of the ExtensionSet. They are also provided to ExtensionRegistry for registration of extensions. The implementations Extension<T, V> and RepeatedExtension<T, V> implement the abstract members of the Extension class. They are also used in the generic portions of the ExtensionSet for getting, setting, and clearing field values. Both take in a FieldCodec<T> (including Extension<T, V>) allowing us to reuse existing code. Since FieldCodec<T> doesn't support everything we need for extensions however some modifications will be made including allowing different default values in the factory methods and a new delegate parameter for merging values. The merge value delegate in the prototype is defined as this:
public delegate void MergeDelegate(ref T field, T value);
This delegate is used exclusively by extension value and does not effect repeated fields.
Example using extension identifiers
Foo bar = new Foo();
bar.SetExtension(BazPackageExtensions.Boo, 123);
Debug.Assert(bar.HasExtension(BazPackageExtensions.Boo));
Debug.Assert(bar.GetExtension(BazPackageExtensions.Boo) == 123);
bar.ClearExtension(BazPackageExtensions);
Example using repeated extension identifiers
Foo bar = new Foo();
RepeatedField<int> fars = bar.GetExtension(BazPackageExtensions.RepeatedFar);
fars.Add(10);
Debug.Assert(fars == bar.GetExtension(BazPackageExtensions.RepeatedFar));
Compiler
Extensions defined on the top level of a package are generated in a static container class named similarly to the reflection class. For example for a package named Foo the reflection class is named FooReflection and the top level extension class is named FooExtensions. Extensions in other types are generated with a static Extensions class similarly to the Types class.
To generate extensions FieldGeneratorBase receives a new virtual method called "GenerateExtensionCode". This method is overriden by field generators to create extension identifiers for their field definition. This also allows generators to reuse their "GenerateCodecCode" method if possible.
Example generated identifiers based on unit tests
public static partial class UnittestExtensions {
public static readonly pb::Extension<global::Google.Protobuf.TestProtos.Proto2.TestAllExtensions, int> OptionalInt32Extension =
new pb::Extension<global::Google.Protobuf.TestProtos.Proto2.TestAllExtensions, int>(pb::FieldCodec.ForInt32(8, 0));
// ...
public static readonly pb::RepeatedExtension<global::Google.Protobuf.TestProtos.Proto2.TestAllExtensions, int> RepeatedInt32Extension =
new pb::RepeatedExtension<global::Google.Protobuf.TestProtos.Proto2.TestAllExtensions, int>(pb::FieldCodec.ForInt32(248));
// ...
public static readonly pb::Extension<global::Google.Protobuf.TestProtos.Proto2.TestAllExtensions, int> DefaultInt32Extension =
new pb::Extension<global::Google.Protobuf.TestProtos.Proto2.TestAllExtensions, int>(pb::FieldCodec.ForInt32(488, 41));
// ...
}
public sealed partial class TestNestedExtension : IMessage<TestNestedExtension> {
// ...
#region Extensions
/// <summary>Container for extensions for other messages declared in the TestNestedExtension message type.</summary>
[global::System.Diagnostics.DebuggerNonUserCodeAttribute]
public static partial class Extensions {
/// <summary>
/// Check for bug where string extensions declared in tested scope did not
/// compile.
/// </summary>
public static readonly pb::Extension<global::Google.Protobuf.TestProtos.Proto2.TestAllExtensions, string> Test =
new pb::Extension<global::Google.Protobuf.TestProtos.Proto2.TestAllExtensions, string>(pb::FieldCodec.ForString(8018, "\u0074\u0065\u0073\u0074"));
/// <summary>
/// Used to test if generated extension name is correct when there are
/// underscores.
/// </summary>
public static readonly pb::Extension<global::Google.Protobuf.TestProtos.Proto2.TestAllExtensions, string> NestedStringExtension =
new pb::Extension<global::Google.Protobuf.TestProtos.Proto2.TestAllExtensions, string>(pb::FieldCodec.ForString(8026, ""));
}
#endregion
}
Groups
Library
FieldCodec will receive a new factory called "ForGroup" and will take a start tag and end tag. CodedOutputStream will receive a new method to write groups called WriteGroup (currently this method just calls IMessage.WriteTo). CodedInputStream will receive a new method to read groups called ReadGroup.
CodedInputStream's ReadGroup prototype implementation works like this:
- Perform recursion checking
- Get old tag and calculate next end tag based on last tag
- Merge message using builder
- When ReadTag reads the calculated end tag, it returns zero.
- ReadGroup puts the old tag back into place
The prototype does not perform tag checking, so the wrong end tag will not throw an error unlike SkipGroup.
UnknownFieldSet will also receive a field for group fields. In the prototype this is done with a list of UnknownFieldSet.
Compiler
Protoc will be modified to add GroupFieldGenerator and RepeatedGroupFieldGenerator. These classes inherit MessageFieldGenerator and RepeatedMessageFieldGenerator and override parsing code, serialization code, serialized size code, and codec code to support the different tag format of groups and use the WriteGroup, ReadGroup, and ForGroup functions in coded streams and codecs. Generators for oneof fields will also be created.
Generated parsing code
if (!HasFoo) {
foo_ = new Bar();
}
input.ReadGroup(foo_);
Generated serialization code
if (HasFoo) {
output.WriteRawTag(1, 2, 3, 4);
output.WriteGroup(Foo);
output.WriteRawTag(1, 2, 3, 5);
}
Generated serialized size code
if (HasFoo) {
size += 8 + pb::CodedOutputStream.ComputeGroupSize(Foo);
}
Generated codec code
pb::FieldCodec.ForGroup(67305985, 84083201, Foo.Parser);
Drawbacks
More maintenance just to support an old version. Proto2 is in maintenance mode so a full proto2 implementation would be more work now and possibly in the future than implementing extensions just for descriptors.
Activity