- GFlowNet: a proposal engine that samples edits (small graph changes) in proportion to how good they look.
- Focus‑Flow: a physics‑style process that keeps a live attention field (π) over the graph (what the network cares about now).
Why marry them
- Let π steer what to try next; let accepted edits reshape π. That closes the loop so exploration stays useful and the network keeps learning.
The loop (5 steps)
- Snapshot the current focus (π_t).
- GFlowNet proposes a batch of edits (add link, up‑weight, attach evidence…).
- Score each edit with a fast local focus‑gain estimate Δπ̂ (no global recompute).
- Pass budget/guard checks → commit the best subset.
- Recompute focus π_{t+1}, train GFlowNet on what worked, repeat.
What’s rewarded
- Edits that increase order/information (lower free energy / raise useful focus) get paid; noise burns fees. Incentives match the global objective.
Guardrails (so it doesn’t go off the rails)
- Quotas per topic, fees, and rate limits at hot nodes.
- Proofs/audits for suspicious Δπ̂.
- Costs in the reward: storage/compute/network all counted.
- Rollback window + revert metrics keep reliability in check.
Glossary (one‑liners)
- π: the network’s current attention allocation over nodes.
- Δπ̂: quick estimate of how much an edit would shift π locally.
- Edit/Diff: a small set of graph changes (e.g., add_link u→v with weight/tag).
- R(x): the reward used by GFlowNet when proposing edits.
- SubTB: a training trick (sub‑trajectory balance) that spreads credit over long edit sequences.
- Cyberlink: a signed edge; the atomic “fact” we add.
Tiny worked example (concrete)
- Question spikes interest in Cat.
- π raises near Cat; GFlowNet proposes: (Cat→Animal [h‑edge]), (Cat→Wikipedia‑Cat [d‑edge]).
- Δπ̂ says both improve coverage with low cost; budgets OK.
- Commit; Focus‑Flow diffuses attention to Animal + sources.
- π_{t+1} stabilises with better hierarchy & references.
- GFlowNet is trained on this success pattern; next time it proposes similar high‑yield motifs.
— gflownet × focus-flow — key insights (chapters) chapter 1 — overview and thesis
- gflownet is a proposal engine that samples structured edits with probability proportional to a reward r(x)
- focus-flow computes a global attention field π over the network graph from ongoing activity
- we marry them by letting π shape r(x), while each accepted edit updates the graph and shifts π chapter 2 — where they align
- both turn unnormalized scores into stochastic processes with good stationary behavior
- both help approximate intractable sums: gflownet via learned flows and balance losses; focus-flow via fast diffusion/consensus on the graph
- both admit forward/backward views: gflownet uses p_f/p_b over constructive dags; focus-flow has diffusion and global correction phases chapter 3 — where they differ
- objective: gflownet learns to generate; focus-flow computes to rank
- substrate: gflownet reasons over trajectories of edits; focus-flow maintains a global stationary distribution over existing nodes/edges
- agency: gflownet is agentic and exploratory; focus-flow is deterministic aggregation with economic constraints chapter 4 — coupling pattern (one line)
- r(x) = exp(β·φ(x, π_t) + u_task(x) − cost(x) + novelty(x))
- retrain gflownet against a lagged snapshot π_t while the network updates to π_{t+1} chapter 5 — edit language (action/state/reward)
- actions: add_link(u→v, w, tag) · upweight(u→v, Δw) · spawn_node(z) · attach_evidence(v, blob_id)
- state: a partial subgraph (the pending diff) plus local context features
- terminal: a validated batch of edits ready to commit
- reward: r(x) = exp(β₁·Δπ̂(x) + β₂·u_task(x) − β₃·cost(x) + β₄·novelty(x))
- Δπ̂(x): fast local estimate of focus lift from applying diff x (learned surrogate or incremental rank) chapter 6 — training tactics
- use sub-trajectory balance to propagate credit over long edit sequences
- stabilize with a lagged focus prior π̂_t (target network) to avoid chasing a moving field
- temper the policy (entropy or temperature τ) for diversity vs exploitation
- mix on-policy sampling with replay of high-Δπ̂ trajectories for sample efficiency chapter 7 — closed-loop algorithm (sketch)
- snapshot π_t and local graph view
- gflownet proposes k candidate diffs x₁…x_k
- compute Δπ̂(x_i) via incremental rank or a learned surrogate
- filter by budgets/guards; select a feasible subset s ⊆ {x_i}
- commit s → graph_t+1; run focus-flow to compute π_{t+1}
- train gflownet on accepted/rejected trajectories with subtb; update lagged π̂ on schedule
- repeat continuously per bucket/lane chapter 8 — safety and economics (guards)
- quotas: per-bucket caps, rate limits, and fee weights to prevent spam at hot nodes
- fast checks: existence of referenced blobs, schema/acl validation before inclusion
- fraud proofs: require rank-delta proofs or audits on suspicious Δπ̂
- cost shaping: include compute/storage/network costs directly in cost(x) chapter 9 — metrics that actually move the needle
- diversity: entropy of sampled edit types and coverage of node/edge classes
- impact: realized Δπ over time per accepted edit and per unit cost
- stability: spectral gap / mixing time of the focus kernel before vs after edits
- efficiency: edits per joule and per dollar; gpu occupancy and tail latency
- reliability: revert rate, failed-proof rate, and mean time to consistency across shards chapter 10 — minimal pseudocode
- training loop: for each epoch → snapshot π_t → for b in 1..batches: sample τ ~ p_f(·|π_t); score r(τ); backprop subtb → every m steps update target π̂
- deployment loop: at wall-clock ticks, take top-k by r(τ) under budgets; commit; recompute π; log realized Δπ and training targets chapter 11 — design notes
- make Δπ̂ differentiable: a small graph net predicts rank deltas from local motifs so reward remains smooth
- sparsify action space: constrain add_link to whitelisted motifs (triadic closure, co-citation, functional dependencies) for tractability
- multi-scale: run a gflownet per bucket; aggregate compressed Δ-ranks to a hub and feed back as priors
- asynchrony: prefer eventual consistency with bounded staleness windows over global barriers chapter 12 — prototype plan
- single-bucket sandbox with 10⁶ edges, incremental pagerank, and a tiny gflownet over add_link/upweight
- offline evaluator that replays diffs and measures realized Δπ and stability metrics
- ablations: fixed prior vs focus-shaped r(x); tb vs subtb; with/without lagged π̂ chapter 13 — open questions
- how to bound global regret when π drifts quickly under heavy edit throughput
- how to prevent collusive edits that game Δπ̂ while hurting downstream utility
- what economic signals best stabilize exploration at scale without collapse to popular hubs chapter 14 — one-sentence takeaway
- let focus-flow set the beat; let gflownet compose moves that follow the beat yet expand the score, and train on the delta between the two