Unreachable Bikes Workflow¶
This document outlines the workflow for identifying and managing unreachable bikes. This process is handled by a GCP - cloud run job and the specific job implementation in forest-jobs repo that interacts with the Wunder backend API.
Workflow Description¶
The "Unreachables bikes" workflow is a scheduled job that runs periodically to ensure the state of the bikes in the Wunder system is accurate. The primary goal is to identify bikes that are Active but no longer responding to commands (or states refresh), and mark them as Unreachable. It also identifies bikes that were Unreachable and have become responsive again, marking them back as Active.
High-Level Process¶
- The entire flow is triggered by an scheduled cron job hosted in GCP - Cloud Scheduler every 10 min.
- Fetch Data: The job starts by fetching all vehicles from the Wunder backend V2
- Vehicle Evaluation: For each vehicle, the job evaluates its state based on two main scenarios:
- Scenario A: Active bike might be unreachable. If a bike is
Active, not reserved, and hasn't reported its position for more than 10 minutes, it's considered potentially unreachable. - Scenario B: Unreachable bike might be active again. If a bike is marked as
Unreachableand is not reserved, the job checks if it has come back online.
- Scenario A: Active bike might be unreachable. If a bike is
- State Update: Based on the evaluation, the job calls a Wunder endpoint (
/refresh-state) to determine the bike's actual reachability. If the endpoint returns an error, the bike is considered unreachable. If it returns a success status, the bike is reachable. -
Notifications: The job sends notifications to a
#alerts-vehicles-unreachableSlack channel for every state change, providing visibility into which bikes have become unreachable or have recovered.
Technical Specification¶
This workflow is implemented as a Go application running as a GCP cloud run job and is being maintained within the forest-jobs and the forest-infrastructure repos.
Repos:¶
Key Components¶
- WunderClient: A client to interact with Wunder's Backend API v2.
GetAllVehicles: Fetches all vehicles from/api/v2/vehicles/cached.IsVehicleReachable: Pings a vehicle by callingGET /api/v2/vehicles/{id}/refresh-state. A200 OKresponse means the vehicle is reachable. Any other status code or network error implies it is unreachable.ChangeVehicleState: Updates a vehicle's state with aPATCHrequest to/api/v2/vehicles/{id}.
- VehicleProcessor: Contains the core logic to process each vehicle.
- SlackNotifier: Sends formatted messages to a Slack webhook for real-time alerts.
Logic Details¶
- Vehicle States based on {Wunder vehicle states}(https://humanforest.backend.fleetbird.eu/vehicle-state/index):
VehicleActive:0VehicleUnreachable:3
- Idle Threshold: A vehicle is considered idle if its
positionOriginatedAttimestamp is older than10minutes (MaxMinutesInactive).
Execution Flow¶
- Configuration: The job loads its configuration from environment variables, including Wunder API credentials and the Slack webhook URL. A
DRY_RUNmode is available for testing, which prevents any state changes from being made. handleIdleVehicle:- Trigger: Vehicle is
Active, not reserved, and inactive for > 10 minutes. - Action: Calls
IsVehicleReachable. - Outcome: If the vehicle is not reachable, its state is changed to
VehicleUnreachable, and an "Unreachable vehicle" notification is sent to Slack.
- Trigger: Vehicle is
handleUnreachableVehicle:- Trigger: Vehicle is
Unreachableand not reserved. - Action: Calls
IsVehicleReachable. - Outcome: If the vehicle is reachable, its state is changed back to
VehicleActive, and a "Vehicle reachable again!" notification is sent to Slack.
- Trigger: Vehicle is
This automated process ensures that the operational state of the fleet is correctly represented in our systems, which is crucial for both user experience and operational efficiency.
Sequence Diagram¶
The following diagram explain how the process works, it was made by AI and reviewed, but always check and compare with the current code implementation in the repo
sequenceDiagram
autonumber
participant Main as main()
participant WC as WunderClient
participant VP as VehicleProcessor
participant SN as SlackNotifier
participant Slack as Slack API
Main->>Main: loadConfig()
Main->>WC: NewWunderClient()
Main->>SN: NewSlackNotifier()
Main->>VP: NewVehicleProcessor()
Main->>WC: GetAllVehicles()
WC-->>Main: List<Vehicle>
loop For each batch of 500 vehicles
Main->>VP: Process(vehicle) (goroutine)
alt Vehicle eligible for idle check (state: Active && !Reserved && >10min)
VP->>WC: IsVehicleReachable(id)
WC-->>VP: reachable?
alt reachable == false AND DryRun == false
VP->>WC: ChangeVehicleState(id, Unreachable)
WC-->>VP: OK
VP->>SN: Notify( Unreachable )
SN->>Slack: POST webhook
Slack-->>SN: 200 OK
VP->>VP: processedIdle++
else reachable == true OR DryRun == true
VP->>VP: No state change
end
else Vehicle is unreachable && not reserved
VP->>WC: IsVehicleReachable(id)
WC-->>VP: reachable?
alt reachable == true AND DryRun == false
VP->>WC: ChangeVehicleState(id, Active)
WC-->>VP: OK
VP->>SN: Notify( Reachable again )
SN->>Slack: POST webhook
Slack-->>SN: 200 OK
VP->>VP: processedReachable++
else reachable == false OR DryRun == true
VP->>VP: No state change
end
end
end
Main->>Main: wgSlack.Wait()
Main->>Main: print stats (processedIdle, processedReachable)
