-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bson: Added Encoder and Decoder types for stream encoding/decoding. #127
Conversation
Those types are analog to those found in json and yaml. They allow us to operate on io.Readers/io.Writers instead of raw byte slices. Streams are expected to be sequences of concatenated BSON documents: *.bson files from MongoDB dumps, for example.
JSON and YAML do that too, so let's be consistent.
Hi @maxnoel Adding stream support makes sense! Thanks for the PR! Really clear to read but I'm a little concerned that a maliciously crafted BSON document could be problematic:
They're solvable though! The first just needs a bounds check, and the second could be mitigated by having a The third I think the only realistic way is to include a warning about the possibility in the documentation so at least it's not a surprise - something along the lines of "careful with untrusted data" - I can imagine this being a problem for small, low-RAM cloud instances more than anything. Would you mind including test cases for the above? Thanks again! Dom |
Please consider this: |
Makes sense. I'll make the required modifications tomorrow, and restrict valid document sizes to [5B, 16MB] |
Strangely, the BSON spec defines the document size header as a signed int32, but: - No document can be smaller than 5 bytes (size header + null terminator). - MongoDB constrains BSON documents to 16 MiB at most. Therefore, documents whose header doesn't obey those limits are discarded and Decode returns ErrInvalidDocumentSize. In addition, we're reusing the handleErr panic handler in Decode to protect from unwanted panics in Unmarshal.
Done! I reused the handleErr panic handler instead of writing my own, which means I can't return a specific ErrCorruptDocument error. Let me know if you'd rather I write my own. |
Do you need anything else? |
Hi @maxnoel I'm getting through the backlog this afternoon - a quick glance looks great, should have a proper review in an hour or so 👍 Thanks very much! |
bson/stream.go
Outdated
) | ||
|
||
// ErrInvalidDocumentSize is an error returned when a BSON document's header | ||
// contains a size smaller than minDocumentSize or greater than maxDocumentSize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minDocumentSize
and maxDocumentSize
isn't visible to the user
Hi @maxnoel Could you export the two const's referenced in the description and we can get this merged 👍 Great addition, it should make mgo easier to use for many people. And thanks for the slightly paranoid defensive programming ;) Dom |
Done! |
Thanks @maxnoel - we really appreciate it! |
No problems, glad I could contribute. Post-mortem analysis of Mongo dumps is something I do fairly frequently at work, and now that I'm doing it in Go instead of Python, copy/pasting my own decoder every time got old quickly ;) |
…lobalsign#127) * bson: Added Encoder and Decoder types for stream encoding/decoding. Those types are analog to those found in json and yaml. They allow us to operate on io.Readers/io.Writers instead of raw byte slices. Streams are expected to be sequences of concatenated BSON documents: *.bson files from MongoDB dumps, for example. * Stream: NewEncoder and NewDecoder now return pointers. JSON and YAML do that too, so let's be consistent. * Stream decoder: added checks on document size limits, and panic handler. Strangely, the BSON spec defines the document size header as a signed int32, but: - No document can be smaller than 5 bytes (size header + null terminator). - MongoDB constrains BSON documents to 16 MiB at most. Therefore, documents whose header doesn't obey those limits are discarded and Decode returns ErrInvalidDocumentSize. In addition, we're reusing the handleErr panic handler in Decode to protect from unwanted panics in Unmarshal. * Exported MinDocumentSize and MaxDocumentSize consts.
Those types are analog to those found in json and yaml. They allow us to operate on io.Readers/io.Writers instead of raw byte slices. Streams are expected to be sequences of concatenated BSON documents: *.bson files from MongoDB dumps, for example.