Skip to content

Unreachable Bikes Workflow

This document outlines the workflow for identifying and managing unreachable bikes. This process is handled by a GCP - cloud run job and the specific job implementation in forest-jobs repo that interacts with the Wunder backend API.

Workflow Description

The "Unreachables bikes" workflow is a scheduled job that runs periodically to ensure the state of the bikes in the Wunder system is accurate. The primary goal is to identify bikes that are Active but no longer responding to commands (or states refresh), and mark them as Unreachable. It also identifies bikes that were Unreachable and have become responsive again, marking them back as Active.

High-Level Process

  1. The entire flow is triggered by an scheduled cron job hosted in GCP - Cloud Scheduler every 10 min.
  2. Fetch Data: The job starts by fetching all vehicles from the Wunder backend V2
  3. Vehicle Evaluation: For each vehicle, the job evaluates its state based on two main scenarios:
    • Scenario A: Active bike might be unreachable. If a bike is Active, not reserved, and hasn't reported its position for more than 10 minutes, it's considered potentially unreachable.
    • Scenario B: Unreachable bike might be active again. If a bike is marked as Unreachable and is not reserved, the job checks if it has come back online.
  4. State Update: Based on the evaluation, the job calls a Wunder endpoint (/refresh-state) to determine the bike's actual reachability. If the endpoint returns an error, the bike is considered unreachable. If it returns a success status, the bike is reachable.
  5. Notifications: The job sends notifications to a #alerts-vehicles-unreachable Slack channel for every state change, providing visibility into which bikes have become unreachable or have recovered.

    Slack Channel for Unreachabes bikes
    Slack Channel alerts for Unreachable bikes

Technical Specification

This workflow is implemented as a Go application running as a GCP cloud run job and is being maintained within the forest-jobs and the forest-infrastructure repos.

Repos:

Key Components

  • WunderClient: A client to interact with Wunder's Backend API v2.
    • GetAllVehicles: Fetches all vehicles from /api/v2/vehicles/cached.
    • IsVehicleReachable: Pings a vehicle by calling GET /api/v2/vehicles/{id}/refresh-state. A 200 OK response means the vehicle is reachable. Any other status code or network error implies it is unreachable.
    • ChangeVehicleState: Updates a vehicle's state with a PATCH request to /api/v2/vehicles/{id}.
  • VehicleProcessor: Contains the core logic to process each vehicle.
  • SlackNotifier: Sends formatted messages to a Slack webhook for real-time alerts.

Logic Details

  • Vehicle States based on {Wunder vehicle states}(https://humanforest.backend.fleetbird.eu/vehicle-state/index):
    • VehicleActive: 0
    • VehicleUnreachable: 3
  • Idle Threshold: A vehicle is considered idle if its positionOriginatedAt timestamp is older than 10 minutes (MaxMinutesInactive).

Execution Flow

  1. Configuration: The job loads its configuration from environment variables, including Wunder API credentials and the Slack webhook URL. A DRY_RUN mode is available for testing, which prevents any state changes from being made.
  2. handleIdleVehicle:
    • Trigger: Vehicle is Active, not reserved, and inactive for > 10 minutes.
    • Action: Calls IsVehicleReachable.
    • Outcome: If the vehicle is not reachable, its state is changed to VehicleUnreachable, and an "Unreachable vehicle" notification is sent to Slack.
  3. handleUnreachableVehicle:
    • Trigger: Vehicle is Unreachable and not reserved.
    • Action: Calls IsVehicleReachable.
    • Outcome: If the vehicle is reachable, its state is changed back to VehicleActive, and a "Vehicle reachable again!" notification is sent to Slack.

This automated process ensures that the operational state of the fleet is correctly represented in our systems, which is crucial for both user experience and operational efficiency.

Sequence Diagram

The following diagram explain how the process works, it was made by AI and reviewed, but always check and compare with the current code implementation in the repo

sequenceDiagram
    autonumber

    participant Main as main()
    participant WC as WunderClient
    participant VP as VehicleProcessor
    participant SN as SlackNotifier
    participant Slack as Slack API

    Main->>Main: loadConfig()
    Main->>WC: NewWunderClient()
    Main->>SN: NewSlackNotifier()
    Main->>VP: NewVehicleProcessor()

    Main->>WC: GetAllVehicles()
    WC-->>Main: List<Vehicle>

    loop For each batch of 500 vehicles
        Main->>VP: Process(vehicle) (goroutine)

        alt Vehicle eligible for idle check (state: Active && !Reserved && >10min)
            VP->>WC: IsVehicleReachable(id)
            WC-->>VP: reachable?

            alt reachable == false AND DryRun == false
                VP->>WC: ChangeVehicleState(id, Unreachable)
                WC-->>VP: OK

                VP->>SN: Notify( Unreachable )
                SN->>Slack: POST webhook
                Slack-->>SN: 200 OK

                VP->>VP: processedIdle++
            else reachable == true OR DryRun == true
                VP->>VP: No state change
            end

        else Vehicle is unreachable && not reserved
            VP->>WC: IsVehicleReachable(id)
            WC-->>VP: reachable?

            alt reachable == true AND DryRun == false
                VP->>WC: ChangeVehicleState(id, Active)
                WC-->>VP: OK

                VP->>SN: Notify( Reachable again )
                SN->>Slack: POST webhook
                Slack-->>SN: 200 OK

                VP->>VP: processedReachable++
            else reachable == false OR DryRun == true
                VP->>VP: No state change
            end
        end
    end

    Main->>Main: wgSlack.Wait()
    Main->>Main: print stats (processedIdle, processedReachable)