increase reliability when dealing with closed channels and closure notifications #34
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
If the emit channel is closed, the code will never reopen it. So, we should return an error in this case, as the rabbus.Rabbus has entered into a fail state, and cannot continue.
NotifyClose stacks channels into a slice https://github.com/streadway/amqp/blob/master/channel.go#L444 This means that calling this multiple times results in a long slice of notification channels all of which have nobody listening on them because the iteration of the switch they are from is now gone. When a close event occurred, it would send an error on all of the channels, so that the last NotifyClose should still get its error message, but since they were generally unbuffered, it’s just as likely that the close notification process would block indefinitely on the channel send, waiting on a receiver that no longer is possible.
Finally, when NotifyClose triggered, and re-establishing a consumer failed, we were closing all of the channels. (This enters the fail state noted above where the emit channel is now closed.) But by setting the
r.reconn
to a length of 10, some of the reconnect values kept being delivered while waiting for the end of the channel, allowing other reconnections to build back up, but this would hide the problem that we were actually closing the whole rabbus.Rabbus, which is not what the code clearly intended to do.