-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Raft cluster peer management (GetPeers, AddPeer, RemovePeer) #663
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds support for automatic peer discovery and cluster joining for non-bootstrap nodes. Key changes: - Add AddPeer RPC endpoint to allow nodes to join an existing cluster - Implement TryConnectToCluster() to handle automatic cluster joining - Forward AddPeer requests to leader if received by follower - Add protobuf definitions for AddPeer request/response - Update .gitignore to exclude raft node data files This change allows new nodes to automatically discover and join an existing cluster by attempting to connect to configured peers until successful. Non-leader nodes will forward join requests to the current leader.
Add unit tests to verify the AddPeer behavior in both leader and follower nodes: - Test successful peer addition when node is leader
…ower nodes - Updated TestAddPeer to include checks for adding peers when the node is a leader and a follower. - Introduced temporary directories for each node to ensure isolated testing environments. - Added assertions to confirm that both new peers are successfully integrated into the cluster. - Improved test reliability by implementing a loop to wait for both nodes to join the cluster before completing the test.
- Added RemovePeer RPC endpoint to the Raft service, allowing nodes to remove peers from the cluster. - Introduced RemovePeerRequest and RemovePeerResponse message types in the protobuf definitions. - Updated RaftNode to handle peer removal, including forwarding requests to the leader if the node is not the leader. - Enhanced the README documentation to include details about the new RemovePeerRequest and RemovePeerResponse. - Implemented unit tests for the RemovePeer functionality, ensuring correct behavior when removing both leader and follower nodes. - Updated gRPC and HTTP handlers to support the new RemovePeer functionality. This change enhances the Raft protocol's capability to manage cluster membership dynamically.
Implement Raft cluster management API endpoints to retrieve, add, and remove peers: - Add GetPeers method to retrieve current Raft cluster peers - Implement AddPeer and RemovePeer RPC endpoints for dynamic cluster membership - Update API service definition to include new Raft peer management methods - Add corresponding gRPC and HTTP handlers for peer management - Enhance protobuf definitions with new message types for peer operations These changes provide a comprehensive API for managing Raft cluster membership, allowing dynamic peer addition and removal.
Implement peer discovery and graceful shutdown in raft.go
…amic-adding-raft-642
Configure Tempo tracing service with explicit endpoint binding and add health checks to Docker Compose. This ensures proper tracing integration and service readiness in the observability stack.
…and openssl Modify Dockerfile to use more flexible version constraints for alpine packages, allowing minor version updates while maintaining compatibility.
Enhance Raft node configuration to support optional TLS encryption: - Add IsSecure, CertFile, and KeyFile fields to Raft configuration - Implement conditional TLS server credentials based on secure mode - Update default configuration to disable secure mode - Modify gRPC server startup to handle secure and insecure modes - Improve logging for gRPC server initialization This change provides flexibility in configuring Raft node communication security while maintaining backward compatibility.
Modify the HealthChecker to always return NOT_SERVING status by commenting out Raft-specific health checks.
Introduce getLeaderClient method to centralize leader client retrieval logic in AddPeer and RemovePeer methods. This reduces code duplication and improves maintainability by extracting the common pattern of finding the leader's gRPC address and obtaining a client.
Implement TestSecureGRPCConfiguration to validate secure Raft node configuration: - Add test cases for valid and invalid secure configuration scenarios - Introduce helper function to generate self-signed certificates for testing - Verify TLS credential handling and error conditions - Ensure proper configuration of secure and non-secure gRPC nodes
…c allocation Modify TestSecureGRPCConfiguration to use port 0 for dynamic port allocation, improving test reliability and preventing potential port conflicts during parallel test execution.
…oring integrate Hashicorp's logger adapter using zerolog. This simplifies the Raft node initialization by leveraging built-in logging mechanisms and removing redundant leadership tracking logic.
…sertions Enhance HTTP server test by: - Adding error handling for gRPC server startup - Using require assertions for clearer test failures - Implementing panic recovery for gRPC server - Improving server startup error detection
Remove the `monitorLeadership()` method from the Raft node initialization, which was previously commented out. This simplifies the node startup process and removes unnecessary leadership tracking logic that was likely superseded by more efficient Raft cluster management mechanisms.
Implement thorough test suite for RemovePeer API method, covering: - Successful peer removal - Error handling for uninitialized Raft node - Handling of non-existent peer removal - Proper gRPC error code validation
Implement thorough test suite for GetPeers API method, covering: - Successful peer retrieval with a leader node - Error handling for uninitialized Raft node - Validation of returned peers map structure - Proper gRPC error code validation
Enhance peer management functionality by introducing gRPC address tracking: - Update AddPeer method to include gRPC address parameter - Modify AddPeerRequest and related protobuf definitions - Extend peer addition logic to store gRPC address in local peers list - Update API and RPC methods to handle new gRPC address field - Add comprehensive test cases for AddPeer with gRPC address validation
…error handling Enhance peer management methods by: - Adding context with timeout for AddPeer and RemovePeer operations - Improving error messages with more context - Using getter methods for request fields - Updating test cases to reflect new method signatures - Adding more robust error handling and logging
Implement TestFSMPeerOperations to validate Raft cluster peer management: - Create a multi-node Raft cluster with bootstrap and follower nodes - Verify peer synchronization across nodes - Ensure consistent peer information in FSM state - Validate leader election and consistency - Add robust assertions for peer addition and state tracking
Improve peer management by: - Adding CommandAddPeer and CommandRemovePeer to FSM - Implementing peer synchronization across Raft cluster - Adding waitForLeader method with retry mechanism - Enhancing error handling and logging for peer operations - Updating leader client retrieval with more reliable mechanism
Update Raft node and RPC methods to accept context parameter: - Modify Apply method to include context for better request tracing and timeout control - Update forwardToLeader and applyInternal methods to use context - Adjust RPC server methods to pass context through - Refactor test cases to provide context when calling Apply - Improve error handling and request forwarding with context support
…ation Implement a new GetPeerInfo RPC method to support peer synchronization across Raft cluster: - Add GetPeerInfoRequest and GetPeerInfoResponse protobuf definitions - Create RPC method to query peer information from other nodes - Implement peer synchronization mechanism with periodic checks - Add method to query and update peer information across cluster - Enhance peer management with cross-node information retrieval
Update testcontainers-go dependency to the latest version, which includes potential bug fixes and improvements.
Remove the placeholder DiscoverPeers method that was not implemented, keeping the codebase clean and focused on existing peer management functionality.
Improve peer synchronization and RPC method implementation: - Remove error handling from syncPeers method - Simplify StartPeerSynchronizer goroutine - Update GetPeerInfo RPC method to use getter method - Remove unnecessary logging and error checks
…guration management Break down Raft node creation into smaller, focused functions: - Extract node configuration initialization - Create separate methods for FSM, stores, and transport setup - Improve error handling and logging during node creation - Add context cancellation for peer synchronization - Enhance cluster configuration and bootstrapping logic
Signed-off-by: Sina Darbouy <[email protected]>
Modify package version constraints to use more flexible version matching for git, make, and openssl packages, allowing minor version updates while maintaining compatibility.
Update the protoc-gen-go version in generated protobuf files for both API and Raft services, ensuring compatibility and using the latest minor version.
Remove hardcoded ARM64 architecture setting in docker-compose-raft.yaml, allowing for more flexible deployment configurations.
Update load balancer strategies to accept a context parameter, enabling timeout and cancellation support for proxy selection. This change introduces context handling in: - ConsistentHash - Random - RoundRobin - WeightedRoundRobin Also add a FindProxyTimeout constant in the server to provide a default timeout for proxy selection.
Add graceful handling for raft cluster bootstrap when the cluster is already initialized, preventing unnecessary errors and improving startup robustness. Log an informative message when skipping bootstrap due to existing cluster configuration.
Add comments to clarify the purpose of AddPeer and GetPeerInfo gRPC request handlers, improving code readability and documentation for Raft RPC server methods.
Improve input validation and error handling for Raft RPC methods: - Add null and empty field checks for AddPeer and RemovePeer requests - Provide more descriptive error messages - Refactor GetPeerInfo to handle non-existent peer cases - Ensure consistent error response formatting
…or creation Simplify error creation in RPC methods by using errors.New instead of fmt.Errorf, improving code consistency and removing unnecessary formatting overhead.
Enhance Raft cluster operations with: - Robust LeaveCluster method with timeout and logging - Comprehensive peer validation in FSM - Metrics tracking for peer additions, updates, and removals - Improved error handling and state checks - Added validation for peer payload addresses
…ions Improve API documentation for Raft peer-related methods and messages: - Add detailed descriptions and examples for GetPeers, AddPeer, and RemovePeer RPC methods - Include comprehensive field descriptions for PeersResponse, PeerInfo, AddPeerRequest, and related response messages - Update Swagger/OpenAPI specifications with more informative operation and schema descriptions - Improve README.md documentation for peer-related message fields
Implement thorough test cases for LeaveCluster method covering: - Single node cluster - Follower leaving multi-node cluster - Leader leaving multi-node cluster - Handling nil node scenarios - Verifying cluster state after node departure Enhance test coverage for Raft cluster management and node removal logic.
Implement thorough test suites for Raft Node methods: - GetPeers: Test peer retrieval in various cluster configurations - GetLeaderClient: Verify leader client retrieval in single and multi-node clusters - Shutdown: Validate node shutdown behavior with different scenarios Enhance test coverage for Raft node management, improving reliability and robustness of cluster operations.
Implement thorough test suites for Raft RPC server methods: - AddPeer: Test peer addition with various input scenarios - RemovePeer: Validate peer removal in different conditions - GetPeerInfo: Verify peer information retrieval Enhance test coverage for Raft RPC server operations, improving reliability and robustness of cluster management methods.
Overview
Packages and Vulnerabilities (55 package changes and 4 vulnerability changes)
Changes for packages of type
|
mostafa
reviewed
Feb 22, 2025
Simplify Raft RPC response messages by removing the redundant error field across multiple protobuf message types: - ForwardApplyResponse - AddPeerResponse - RemovePeerResponse Update related code to handle errors without relying on the error string field, improving error handling consistency and reducing unnecessary message complexity.
Modify Raft peer metrics to support node-specific tracking: - Convert RaftPeerRemovals, RaftPeerAdditions, and RaftPeerUpdates to labeled CounterVec - Remove redundant RaftPeerUpdates metric - Update metric incrementation to include node ID labels - Simplify peer tracking logic in FSM Apply method
Move the FindProxyTimeout constant from network/server.go to config/constants.go to centralize configuration and improve code organization. Update the server implementation to use the new constant location.
mostafa
approved these changes
Feb 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for you contribution! LGTM!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ticket(s)
#642
Description
This PR implements Raft peer management APIs to enable adding, removing, and querying peers in the Raft cluster. Key changes include:
Development Checklist
make gen-docs
command.Legal Checklist