[OCR Extract - Question 1] n

Topic: Functions and Graphs

n [1.0m]

Part 1: 0 [1m]

Review

You're all caught up!

Similar Problems

test
Functions and Graphs
test
Calculus Textbook
Functions and Graphs
Find the derivative of the following function.
Calculus Textbook
Functions and Graphs
Find the derivative of the following function.
Calculus Textbook
Functions and Graphs
Find the derivative of the following function.
OCR Extract - Raw
Functions and Graphs
Failed to process OCR into structured format. Please edit manually.
OCR Extract - Question 1
Functions and Graphs
pages=[OCRPageObject(index=0, markdown='# Unweighted Layered Graph Traversal: Passing a Crown via Entropy Maximization* \n\nXingjian Bai ${ }^{\\ddagger}$ Christian Coester ${ }^{\\ddagger}$ Romain Cosson ${ }^{\\S}$\n\n## Abstract\n\nIntroduced by Papadimitriou and Yannakakis in 1989, layered graph traversal is a central problem in online algorithms and mobile computing that has been studied for several decades, and which now is essentially resolved in its original formulation. In this paper, we demonstrate that what appears to be an innocuous modification of the problem actually leads to a drastic (exponential) reduction of the competitive ratio. Specifically, we present an algorithm that is $\\mathcal{O}\\left(\\log ^{2} w\\right)$-competitive for traversing unweighted layered graphs of width $w$. Our algorithm chooses the agent\'s position simply according to the probability distribution over the current layer that maximizes the sum of entropies of the induced distributions in the preceding layers.\n\n## 1 Introduction\n\nExploring an unknown environment with a mobile agent is a fundamental task in robotics and artificial intelligence [PY91]. Despite its importance, very few models allow for rigorous worst-case analysis, especially when the environment is a general graph (or metric space). One successful model introduced in the field of online algorithms is called \'layered graph traversal\', and it has been studied extensively since the 1990s [PY91, CL93, Ram95, Bur96, FFK $^{+} 98$, BCR22, BCR23, BCAGL23, CM24]. In this paper, we show that a simple and natural assumption on the problem - specifically, that the unknown graph is unweighted - drastically reduces its competitive ratio.\n\nProblem setting. In layered graph traversal, a mobile agent starts from some node - called the source - of an unknown weighted graph $G=(V, E)$, and is tasked to reach some other node - called the target. The graph is divided into "layers", where the $i$-th layer refers to the set of nodes at combinatorial depth $i$ (i.e. $i$ hops away from the source). The agent is short-sighted in the sense that it only gets to see layer $i+1$ once it is located at layer $i$. Luckily, the agent is broad-sighted, in the sense that it sees all nodes and edges going from layer $i$ to layer $i+1$ when it reaches layer $i$. The problem is parameterized by the width $w$ of the graph, defined as the maximum number of nodes in a layer. The cost of the agent is defined as the total distance traveled until reaching the target. A deterministic (resp. randomized) algorithm for layered graph traversal is said to be $c_{w}$-competitive if its cost (resp. expected cost) is at most $c_{w}$ times the length of the shortest path from source to target. The deterministic competitive ratio of layered graph traversal is known to lie between $\\Omega\\left(2^{w}\\right)\\left[\\mathrm{FFK}^{+} 98\\right]$ and $\\mathcal{O}\\left(w 2^{w}\\right)$ [Bur96] and its randomized counterpart was settled to $\\Theta\\left(w^{2}\\right)$ in [BCR22, BCR23]. It is known $\\left[\\mathrm{FFK}^{+} 98\\right]$ that layered graph traversal is equivalent to the special case where the graph is a tree and all edge lengths are either 0 or 1 .\n\nIn the unweighted variant (where all edge lengths are exactly 1), a depth-first search approach (DFS) is clearly $2 w-1$-competitive (see Appendix A). Despite the broad interest that layered graph traversal received, no randomized algorithm was shown to improve over this bound until [CM24] recently proposed a $\\mathcal{O}(\\sqrt{w})$-competitive randomized algorithm in the unweighted setting.\n\nOur results. In this paper, we show that the assumption that the graph is unweighted allows for an even more drastic improvement in the competitive ratio, reducing it to $\\mathcal{O}\\left(\\log ^{2} w\\right)$. Our algorithm is remarkably simple: It maintains at all times that the probability distribution of the agent\'s location maximizes an entropy defined on the revealed part of the graph. The majority of this paper is dedicated to proving this randomized upper bound, based on extending techniques from the online mirror descent framework $\\left[\\mathrm{BCL}^{+} 18\\right.$, BCLL21] with several\n\n[^0]\n[^0]: *The arXiv version of the paper can be accessed at https://arxiv.org/abs/2404.16176\n ${ }^{\\dagger}$ MIT. The work was carried out while at University of Oxford.\n $\\ddagger$ University of Oxford.\n $\\S$ Inria, Paris.', images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=1, markdown='![img-0.jpeg](img-0.jpeg)\n\nFigure 1: Mario is a well-known example of an agent that is short-sighted but broad-sighted. The width $w$ corresponds to the screen height, with the target located somewhere off to the right. A natural question is: faced with the unknown maze ahead, which path should Mario choose?\nnew ideas. A few simple motivating results, such as tightness of $2 w-1$ for deterministic algorithms (achieved by depth-first search), a randomized lower bound of $\\Omega(\\log w)$, and some natural but unsuccessful randomized algorithms are surveyed in Appendix A.\n\nApplications: playing Mario, passing a crown and learning-augmented algorithms. An immediate application of unweighted layered graph traversal is illustrated in Figure 1, where the famous video game character Mario is short-sighted and broad-sighted by design.\n\nOn a more general ground, there are many situations where an agent is short-sighted in time (because it cannot predict the future) but broad-sighted in space (because it sees all past and current states of the world). For an amusing example, consider the following application: suppose a king or queen must design a rule for the passing of their crown and wishes to minimize how much it will "travel" between their descendants (supposedly, less movement leads to more political stability). They could decide that the leader of each new generation shall be chosen by an unweighted layered graph traversal algorithm. Note that the traditional method "give the crown to the closest living relative of the last holder" is equivalent to depth-first search, which has lesser competitive guarantees than the algorithm we introduce. Admittedly, the interest of today\'s monarchies in deviating from traditional methods might be limited.\n\nThe principle of passing a crown/baton applies to wider settings, such as combining strategies in the context of learning-augmented algorithms: $\\left[\\mathrm{ACE}^{+} 23 \\mathrm{a}\\right]$ recently proposed to use a layered graph traversal algorithm to choose between $w$ expert strategies, when there is a (metric) cost associated to switching between experts. In the general setting (when the experts move at arbitrary speeds) a weighted layered graph traversal algorithm induces a competitive ratio in $O\\left(w^{2}\\right)$, but in settings where all expert strategies move at the same speed, an unweighted layered graph traversal algorithm could be employed, resulting in a competitive ratio in $O\\left(\\log ^{2} w\\right)$ against the best combination of expert strategies.\n\n# 1.1 Discussion of techniques \n\nEntropy maximization. The downside of deterministic online algorithms is that they are completely predictable. This enables strong lower bounds, as an adversary can construct the input in the way that is most detrimental to the choices of the algorithm. Therefore, the key to achieving better competitive ratios via randomization lies in making algorithms less predictable. The motivating idea of our algorithm is to take this strategy to the extreme, by aiming for the most unpredictable configuration at all times. The unpredictability of a random decision is typically measured by its entropy, $\\sum_{u} x_{u} \\log \\frac{1}{x_{u}}$, where $x_{u}$ denotes the probability of decision $u$.\n\nTo implement the idea of maximizing unpredictability, a first tempting idea might be to choose the probability distribution over nodes $\\ell$ of the current layer that maximizes $\\sum_{\\ell} x_{\\ell} \\log \\frac{1}{x_{\\ell}}$, i.e., the uniform distribution. However, this ignores that the distribution over the current layer also induces distributions over all previous layers; for example, if all but one node of the current layer share the same ancestor in the preceding layer, then the uniform (high-entropy) distribution over the current layer would induce a very non-uniform (low-entropy) distribution', images=[OCRImageObject(id='img-0.jpeg', top_left_x=482, top_left_y=202, bottom_right_x=1200, bottom_right_y=600, image_base64=None)], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=2, markdown='over their ancestors in the preceding layer. We wish to maximize not only the entropy of the current layer, but also of their ancestors in each previous layer. Hence, our algorithm chooses the probability distribution over the current layer that maximizes the sum of entropies over all layers.\n\nRelation to prior mirror descent based online algorithms. Entropy-driven online algorithms have led to major advances recently [BCN14, BCL+18, BGMN19, BCLL21, CL22, BC22, BCR22]. Unlike our algorithm, these prior algorithms are not expressed as straightforwardly as merely maximizing entropy, but are formulated in the framework of online mirror descent. The entropic functionals (called regularizers) in these prior works involve a perturbation term $\\delta>0$, e.g. $\\Phi(x)=\\sum\\left(x_{u}+\\delta\\right) \\log \\left(x_{u}+\\delta\\right)$ in its simplest forms, and an algorithm (in the continuous-time framework of $\\left[\\mathrm{BCL}^{+} 18, \\mathrm{BCLL} 21\\right]$ ) is then described by a differential inclusion\n\n$$\n\\nabla^{2} \\Phi(\\boldsymbol{x}) \\dot{\\boldsymbol{x}}-\\gamma \\in-N_{K}\n$$\n\nwhere $\\dot{\\boldsymbol{x}}$ is the time derivative of configuration $\\boldsymbol{x}, \\gamma$ is the "desired direction of movement", and $N_{K}$ is the normal cone of the polytope $K$ of configurations. The interpretation of $\\Phi$ in these cases is that it replaces the Euclidean geometry (which would correspond to the case where $\\nabla^{2} \\Phi(x)$ is replaced by the identity matrix, as in gradient descent) by a geometry adapted to the uncertainty of the underlying problem. Although our algorithm also satisfies such an inclusion, it can be formulated even more simply as\n\n$$\nx(i)=\\underset{x \\in K}{\\arg \\min } \\Phi_{i}(x)\n$$\n\nwhere $\\Phi_{i}(x)$ denotes the sum of negative entropies of all layers visible at time $i$. Unlike previous mirror descent based online algorithms, this formulation in fact yields a memoryless algorithm since the selected $\\Phi_{i}$ only depends on the tree distances between the nodes of the current layer. This also allows us to obtain an alternative expression of our algorithm through elementary operations (see Lemma 2.1).\n\nAchieving $\\mathcal{O}\\left(\\log ^{2} \\boldsymbol{w}\\right)$. The analysis of our algorithm is split into deactivation phases (where nodes that become irrelevant are deactivated) and growth phases (where new nodes are added), which at a high level resemble similar phases previously considered in the "evolving tree game" of [BCR22]. However, our methods of analysing these phases have little in common with [BCR22] and require several novel ideas.\n\nOur regularizer can be equivalently written as\n\n$$\n\\Phi_{i}(x)=\\sum_{u} x_{u} \\log x_{u}=\\sum_{u} h_{p(u)} x_{u} \\log \\frac{x_{u}}{x_{p(u)}}\n$$\n\nwhere the sum is over vertices $u$ of the Steiner tree of the current layer, $x_{u}$ is the algorithm\'s probability of residing in the subtree rooted at $u, p(u)$ is the parent of $u$, and $h_{u}$ denotes the round number $i$ minus the combinatorial depth of $u$ in the tree, i.e. the height of $u$ at round $i$. Note that this quantity is well-defined only because the tree is unweighted, and all nodes at the last layer are at the same combinatorial depth. See Appendix B for the identity between the two sums.\n\nThe right-hand side of equation (1.1) reminds us of the "conditional entropy" regularizer employed in [CL22] for metric task systems on hierarchically separated trees. This expression allows us to derive an expression for the change of the conditional probabilities $\\frac{x_{u}}{x_{p(u)}}$ in each step (which does not involve Lagrange multipliers). However, the analysis in [CL22] requires that edge lengths decrease geometrically along root-to-leaf paths, and more crucially it only achieves a competitive ratio linear in the depth of the tree. In our case, the depth of the tree (even if merging edges as in [BCR22]) can be $\\Omega(w)$, so an analysis similar to [CL22] would fail to achieve a sublinear competitive ratio. The high depth of the tree is also the obstacle leading to the $\\Theta\\left(w^{2}\\right)$ competitive ratio for weighted layered graph traversal in [BCR22]. To achieve the exponential improvement to $\\mathcal{O}\\left(\\log ^{2} w\\right)$, a key idea of our proof is to separate the analysis of movement cost when handling deactivation phases, where the probability mass $x_{\\ell}$ of a leaf $\\ell$ decreases to 0 , into two substeps that are analyzed differently depending on whether $x_{\\ell}$ is smaller or greater than $1 / \\operatorname{poly}(w)$. These two substeps require two different potential functions, which we combine into an overall potential function that is compatible with both substeps of deactivation phases and also with growth phases.\n\nAnother interesting aspect - which echoes a technique in [CM24] - is that the value of the regularizer appears in the potential function, instead of the Bregman divergence with an offline configuration. The Bregman divergence would be problematic for us since, due to the lack of $\\delta$-perturbations in $\\Phi_{i}$, its Lipschitz constant can be $\\Omega(w)$, even when evaluated only at the relevant online algorithm configurations.', images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=3, markdown="1.2 Related work The setting of layered graph traversal was introduced in a paper titled 'Shortest path without a map' by Papadimitriou and Yannakakis, motivated by applications in navigation robotics [PY91]. For the case of $w=2$, they showed that the deterministic variant of the problem is essentially equivalent to linear search, a problem which dates back to [Bel63] for which the competitive ratio is equal to 9 [BYCR93]. The first competitive analysis for arbitrary values of $w \\in \\mathbb{N}$ was obtained by $\\left[\\mathrm{FFK}^{+} 98\\right]$, which coined the term 'layered graph traversal' to emphasize how the agent collects information (i.e., the feedback). Following a series of improvements [FFK+98, Ram95, Bur96], the competitive ratio of the deterministic variant of the problem was proved to lie between $\\mathcal{O}\\left(w 2^{w}\\right)$ [Bur96] and $\\Omega\\left(2^{w}\\right)$ [FFK +98 ]. More recently, the competitive ratio of the randomized variant of the problem was resolved up to a constant factor, establishing it at $\\Theta\\left(w^{2}\\right)$ [BCR22, BCR23]. For the unweighted setting, [CM24] recently provided a $\\mathcal{O}(\\sqrt{w})$-competitive algorithm, leaving a substantial gap compared to the classical $\\Omega(\\log w)$ lower-bound. This paper significantly narrows this gap, presenting a $\\mathcal{O}\\left(\\log ^{2} w\\right)$-competitive algorithm.\n\nThe problem was also studied under the name 'metrical service systems' [CL93] to highlight its connection to the setting of metrical task systems [BLS92]. The denomination 'small set chasing' was also recently suggested by [BCR22] to point out its connection with more recent works on set chasing (see, e.g. [BLLS19, AGTG21, Sel20]). The problem is additionally related to several other problems in online computation, such as the $k$-taxi problem [CK19], whose lower bound is based on a reduction from layered graph traversal. Very recently, layered graph traversal has also found new applications in the context of learning-augmented algorithms [ACE+23b, $\\left.\\mathrm{ACE}^{+} 23 \\mathrm{a}\\right]$.\n\nExploration of an unknown environment by a mobile agent. In layered graph traversal, the agent's objective can be restated as follows: the agent must build a complete map of the underlying graph using the fewest moves possible. Graph exploration with a mobile agent has attracted attention since the 19th century [Luc82, Tar95], marked by the formal introduction of the depth-first search (DFS) as a maze-solving algorithm. DFS is particularly suited for exploring general unknown graphs when the agent is short-sighted and narrow-sighted: the agent traverses all $m$ edges in exactly $2 m$ moves, which is optimal in the sense of the competitive ratio. A natural extension of this problem is when the agent can also see its immediate neighborhood (a little less short-sighted) [KP94]. This setting is known as 'online graph exploration' [MMS12], and the goal is for the agent to visit all nodes while incurring a limited cost with respect to the optimal tour (i.e., the solution of the traveling salesman problem). The competitive ratio for that problem is still unsettled (it lies between $\\Omega(1)$ and $\\mathcal{O}(\\log n)$, where $n$ is the number of nodes in the graph) but many restricted classes of graphs admit $\\mathcal{O}(1)$-competitive algorithms, such as planar graphs or some minor-excluded graphs (see [BDHS23] and its references). In contrast, layered graph traversal proposes a different model of feedback where the agent gets to see simultaneously all nodes and edges at some given hop distance from its initial position. This setting is particularly well-motivated when the graph evolves with time (e.g., a phylogenetic tree). Many other graph exploration settings have been introduced, varying the capabilities of the agent: e.g., with limited memory [AKL+79], limited storage [YWB03], or considering multiple agents working a as a team [FGKP06, CMV23, CM24]. For a recent review of graph exploration problems, see the introduction of [Cos24]. Contrarily to the aforementioned examples, this paper studies the case of a single, fully-competent agent.\n1.3 Notations and preliminaries Consider a graph $G=(V, E)$ defined by a set of nodes $V \\subset \\mathcal{V}$ and edges $E$ over $V$, where $\\mathcal{V}$ is an arbitrary countable set. The edges are undirected and unweighted, and the graph is connected. One node is called the root and is denoted by $r \\in V$. For some $i \\in \\mathbb{N}$, we denote by $\\mathcal{L}(i)$ the set of nodes at combinatorial depth $i$, i.e., $i$ hops away from the root. We will refer to $\\mathcal{L}(i)$ as the $i$-th layer of the graph. The width $w$ of the graph is defined as the maximum cardinality of its layers, i.e., $w=\\max _{i}|\\mathcal{L}(i)|$. As standard in layered graph traversal, the goal to reach a certain target can be equivalently simplified to reaching the last layer of the graph.\n\nReduction to trees. For any node $u \\in V$, we denote by $i_{u}$ the combinatorial depth of $u$, i.e. satisfying $u \\in \\mathcal{L}\\left(i_{u}\\right)$. At any step $i \\geq i_{u}$, we call $h_{u}(i)=i-i_{u}$ the height of $u$. For any node $u \\in V \\backslash\\{r\\}$, we can arbitrarily pick a node of the preceding layer $p(u) \\in \\mathcal{L}\\left(i_{u}-1\\right)$, such that $(p(u), u) \\in E .{ }^{1}$ It is clear that traversing the tree\n\n[^0]\n[^0]: ${ }^{1}$ Observe that the choice of $p(u)$ can be made online by the agent. The reduction from graphs to tres is standard in the literature on layered graph traversal. Here, we use the assumption that any non-root node is connected to a node in the preceding layer because a layer refers to the set of nodes at the same combinatorial depth. We note that this assumption is not required in the weighted variant of the problem; see $\\left[\\mathrm{FFK}^{+} 98\\right]$. For more motivation on the setting of this paper, we refer to [CM24].", images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=4, markdown="formed by $T=\\{(p(u), u) \\mid u \\in V \\backslash\\{r\\}\\} \\subseteq E$ in an online manner is no less difficult than traversing the original graph. For this reason, we will focus the rest of the paper on the specific case where the layered graph is a tree. For any node $u \\in V$, we denote by $c(u) \\subset V$ the set of its children of $u$, and by $\\mathcal{L}_{u}(i)$ the subset of nodes of $\\mathcal{L}(i)$ that are descendants of $u$, for some $i \\in \\mathbb{N}$. We also denote by $u \\rightarrow r \\subset V$ the set of all nodes in the path from $u$ to $r$, excluding $r$.\n\nLayered graph traversal algorithm. Formally, a randomized unweighted layered graph traversal algorithm is defined by a function $\\mathcal{A}: \\mathcal{T} \\times \\mathcal{V} \\rightarrow \\mathcal{P}(\\mathcal{V})$ mapping a tree $T \\in \\mathcal{T}$ and a node $u \\in \\mathcal{V}$ in the penultimate layer of $T$ to a distribution $\\boldsymbol{x} \\in \\mathcal{P}(\\mathcal{V})$ over the set of nodes in the ultimate layer of $T$. The algorithm is deterministic if the distribution is always supported on a unique node. The cost of the algorithm is defined as the total number of edges traversed by an agent that picks its next position using $\\mathcal{A}$ at each layer and uses the shortest paths to move between positions elected in consecutive layers. The total cost of an agent until reaching layer $i$ depends on the random choices taken by the agent (if the algorithm is randomized) and we therefore focus on its expectation which we denote by $\\operatorname{Cost}(i)$. The algorithm is $c_{w}$-competitive if, for any graph of width $w$ and any step $i$, it satisfies $\\operatorname{Cost}(i) \\leq c_{w} i$.\n\nFractional perspective on unweighted layered graph traversal. A classical equivalent perspective on layered graph traversal - which is referred to as the fractional view - is that the algorithm $\\mathcal{A}: \\mathcal{T} \\rightarrow \\mathcal{P}(\\mathcal{V})$ maintains a distribution on the ultimate layer of the tree given as input. The movement cost charged to the algorithm is then the optimal transport cost between the two consecutive distributions. Given a fractional algorithm for layered graph traversal, one gets a randomized algorithm with the same cost by considering at each step an optimal coupling between the two consecutive distributions and sampling the next position of the agent according to the conditional distribution associated with its current position. We refer to [BCR22, Section 4] for more details on the reduction.\n\nThe active tree and the active polytope. At any round $i$, we say that a node is 'active' if it has a descendant in $\\mathcal{L}(i)$ and we denote the set of all active nodes by $V(i)$. A node is 'deactivated' at round $i$ if it ceases to be active at that round and the set of all deactivated nodes until round $i$ is denoted by $\\bar{V}(i)$. We will essentially focus on the 'active tree' that is formed by the nodes in $V(i)$. Observe that this tree is alternatively defined as the Steiner tree of $\\mathcal{L}(i) \\cup\\{r\\}$. We will denote by $\\boldsymbol{x}(i) \\in \\mathcal{P}(\\mathcal{L}(i))$ the probability distribution associated with the position of the agent on the $i$-th layer. Instead of viewing $\\boldsymbol{x}$ as a probability distribution, it will be useful to consider it more generally as a point of the active polytope, $K(i)$ that we define as,\n\n$$\nK(i)=\\left\\{\\boldsymbol{x} \\in \\mathbb{R}^{V(i) \\backslash\\{r\\}} \\mid \\forall u \\in V(i) \\backslash \\mathcal{L}(i): x_{u}=\\sum_{v \\in c(u)} x_{v}\\right\\}\n$$\n\nWe emphasize that $x_{r}$ is defined once and for all to be 1 . For that reason, it does not appear as a variable in the definition of the active polytope above (but $x_{r}$ does appear in one constraint, which effectively enforces $\\left.\\sum_{\\ell \\in \\mathcal{L}(i)} x_{\\ell}(i)=1\\right)$. We note that a probability distribution must also satisfy non-negativity constraints, which are not explicit in $K(i)$. It will be obvious that the configuration $\\boldsymbol{x}(i) \\in K(i)$ defined by our algorithm satisfies the positivity constraints (with strict inequality) and can indeed be interpreted as $\\boldsymbol{x}(i) \\in \\mathcal{P}(\\mathcal{L}(i))$. We note that $K(i)$ is an affine subspace of $\\mathbb{R}^{V(i) \\backslash\\{r\\}}$, Denoting by $\\mathbf{e}_{u} \\in \\mathbb{R}^{V(i) \\backslash\\{r\\}}$ the canonical basis vector associated with the element $u \\in V(i) \\backslash\\{r\\}$, we define the normal cone $N_{K(i)}$ associated with $K(i)$,\n\n$$\n\\begin{aligned}\nN_{K(i)} & =\\operatorname{Vect}\\left(\\left\\{\\boldsymbol{e}_{u}-\\sum_{v \\in c(u)} \\boldsymbol{e}_{v}: u \\in V(i) \\backslash \\mathcal{L}(i)\\right\\}\\right) \\\\\n& =\\left\\{\\boldsymbol{\\xi} \\in \\mathbb{R}^{V(i) \\backslash\\{r\\}}: \\xi_{u}=\\lambda_{p(u)}-\\lambda_{u} \\text { with } \\boldsymbol{\\lambda} \\in \\mathbb{R}^{V(i)} \\text { and } \\lambda_{\\ell}=0 \\text { for } \\ell \\in \\mathcal{L}(i)\\right\\}\n\\end{aligned}\n$$\n\nwhich is the vector space orthogonal to the the affine subspace $K(i)$.\nOptimal transport in a tree. For two given configurations, $\\boldsymbol{x}(i) \\in K(i)$ and $\\boldsymbol{x}(i+1) \\in K(i+1)$, we denote the optimal transport cost (also known as the earth-mover distance) between $\\boldsymbol{x}(i)$ and $\\boldsymbol{x}(i+1)$ by $\\operatorname{OT}(\\boldsymbol{x}(i), \\boldsymbol{x}(i+1))$. Extending the definition of a configuration to take value zero outside of the active polytope, the optimal transport cost on a tree is classically expressed as\n\n$$\n\\operatorname{OT}\\left(\\boldsymbol{x}, \\boldsymbol{x}^{\\prime}\\right)=\\sum_{u \\in V}\\left|x_{u}^{\\prime}-x_{u}\\right|\n$$", images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=5, markdown="We note that this definition matches the usual definition of optimal transport on the metric space associated with the underlying tree.\n\nDerivatives and instantaneous movement costs. The derivative of a differentiable real function $f(\\cdot)$ with respect to a single variable $t \\in \\mathbb{R}$ will be denoted by $\\partial_{t} f(t)$. The gradient of a twice-differentiable multivariate function $\\Phi(\\boldsymbol{x})$ will be denoted by $\\nabla \\Phi(\\boldsymbol{x})$, and its Hessian will be denoted by $\\nabla^{2} \\Phi(\\boldsymbol{x})$. For convenience, when the function that is differentiated is a configuration $\\boldsymbol{x}(t) \\in K(i)$, we will adopt the physicist's notation and denote its derivative $\\partial_{t} \\boldsymbol{x}(t)$ by $\\dot{\\boldsymbol{x}}$. Thus, at an arbitrary node $u \\in V(i)$, we will also denote $\\partial_{t} x_{u}(t)$ by $\\dot{x}_{u}$. In general, we will often drop dependence on time when it is clear from the context. We observe that if $\\forall t: \\boldsymbol{x}(t) \\in K(i)$, then $\\dot{\\boldsymbol{x}}(t)$ is orthogonal to the normal cone $N_{K(i)}$, i.e., $\\dot{\\boldsymbol{x}}(t) \\cdot \\boldsymbol{\\xi}=0$ for $\\boldsymbol{\\xi} \\in N_{K(i)}$. We define the instantaneous movement cost at instant $t$ as\n\n$$\n\\partial_{t} \\operatorname{Cost}(t)=\\sum_{u \\in V(i)}\\left|\\dot{x}_{u}\\right|\n$$\n\nBy the triangle inequality, it is clear that for any two configurations $\\boldsymbol{x} \\in K(i)$ and $\\boldsymbol{x}^{\\prime} \\in K(i)$, if $\\boldsymbol{x}(t)$ is a differentiable function with respect to $t$, interpolating between $\\boldsymbol{x}(a)=\\boldsymbol{x}$ and $\\boldsymbol{x}(b)=\\boldsymbol{x}^{\\prime}$ for $a<b$, we have $\\operatorname{OT}\\left(\\boldsymbol{x}, \\boldsymbol{x}^{\\prime}\\right) \\leq \\int_{a}^{b} \\partial_{t} \\operatorname{Cost}(t) d t$. Overloading notation, we will use letter $i$ to refer to the actual configurations $\\boldsymbol{x}(i)$ taken by our algorithm at discrete time steps, and letter $t$ to denote continuously evolving configurations $\\boldsymbol{x}(t)$ interpolating between two consecutive configurations.\n\nConvexity. For some twice differentiable convex function $\\Phi$ defined on an open domain $U$, we denote the Bregman divergence associated to $\\Phi$ by $D(\\cdot, \\cdot)$. It is defined for any $\\boldsymbol{x}, \\boldsymbol{x}^{\\prime} \\in U^{2}$ as, $D\\left(\\boldsymbol{x}^{\\prime}, \\boldsymbol{x}\\right)=\\Phi\\left(\\boldsymbol{x}^{\\prime}\\right)-\\Phi(\\boldsymbol{x})-$ $\\nabla \\Phi(\\boldsymbol{x})\\left(\\boldsymbol{x}^{\\prime}-\\boldsymbol{x}\\right)$, and it is always non-negative.\n\n# 2 Algorithm and analysis \n\nIn this section, we present and analyze our algorithm for layered graph traversal, which is stated here in its fractional formulation (see Section 1.3 for definition). When reaching layer $i \\in \\mathbb{N}$, the algorithm chooses the configuration $\\boldsymbol{x}(i)$ that is the minimizer of the following expression,\n\n$$\n\\boldsymbol{x}(i) \\in \\underset{\\boldsymbol{x} \\in K(i)}{\\arg \\min } \\sum_{u \\in V} x_{u} \\log x_{u}\n$$\n\nwhich we denote by $\\Phi_{i} .{ }^{2}$ We observe that $\\Phi_{i}$ can equivalently be rewritten as\n\n$$\n\\Phi_{i}(\\boldsymbol{x})=\\sum_{u \\in V(i)} x_{u} \\log x_{u}=\\sum_{u \\in V(i) \\backslash\\{r\\}} h_{p(u)}(i) x_{u} \\log \\frac{x_{u}}{\\dot{x}_{p(u)}}\n$$\n\nwhere $h_{p(u)}(i)$ denotes the height of $p(u)$ at round $i$. See Appendix B for details. The rest of this section is devoted to proving the following theorem, which is the main result of the paper.\nTheorem 2.1. The fractional algorithm defined by (2.3) is $\\mathcal{O}\\left(\\log ^{2} w\\right)$-competitive for unweighted layered graph traversal. In other words, the associated randomized algorithm satisfies for any unweighted graph of width $w$ that its expected cost satisfies at all rounds $i$,\n\n$$\n\\operatorname{Cost}(i) \\leq \\mathcal{O}\\left(\\log ^{2} w\\right) \\cdot i\n$$\n\nProof sketch. For any layer $i \\in \\mathbb{N}$, we define the potential\n\n$$\nP(i)=\\frac{4}{w}|\\bar{V}(i)|+4 \\log w \\cdot \\Phi_{i}(\\boldsymbol{x}(i))+6(1+\\log w)^{2} i\n$$\n\nwhere $|\\bar{V}(i)|$ denotes the number of nodes that have been deactivated by round $i$ (see Section 1.3). We will prove that the cost of the fractional algorithm up to round $i$ - which we denote by $\\operatorname{Cost}(i)$ - is always bounded by the potential,\n\n$$\n\\operatorname{Cost}(i) \\leq P(i)\n$$\n\n[^0]\n[^0]: ${ }^{2}$ We note that $\\Phi_{i}$ is defined on the open domain $U=\\mathbb{R}_{+i}^{V(i)}$ and that the minimizer is effectively taken on $K(i) \\cap U$. Such minimizer exists due to the property that $\\partial_{x_{u}} \\phi(\\boldsymbol{x}) \\rightarrow-\\infty$ when $x_{u} \\rightarrow 0$. It is also unique, by strong convexity of $\\Phi_{i}$.", images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=6, markdown='Since $|\\bar{V}(i)| \\leq w i$ and $\\Phi_{i}(\\boldsymbol{x}(i)) \\leq 0$, this implies that $\\operatorname{Cost}(i) \\leq \\mathcal{O}\\left(\\log ^{2} w\\right) \\cdot i$.\nThe proof is by induction on $i$. When moving from layer $i$ to layer $i+1$, the goal is to show that the increase in cost, which equals $\\operatorname{OT}(\\boldsymbol{x}(i), \\boldsymbol{x}(i+1))$, is at most the variation of the potential, which is equal to $P(i+1)-P(i)$. Instead of considering the optimal transport from $\\boldsymbol{x}(i) \\in K(i)$ to $\\boldsymbol{x}(i+1) \\in K(i+1)$, we consider a series of (possibly sub-optimal) phases that together form a transport between $\\boldsymbol{x}(i)$ and $\\boldsymbol{x}(i+1)$. There are two main phases.\n\n1. For each node $\\ell \\in \\mathcal{L}(i)$ that does not have a descendant in $\\mathcal{L}(i+1)$, we displace the fractional mass present at $\\ell$ to obtain $x_{\\ell}=0$. We call this phase the deactivation phase. The deactivation of leaf $\\ell$ is performed continuously by setting $\\boldsymbol{x}(t)=\\arg \\min _{x \\in K(i)} \\Phi_{i}(\\boldsymbol{x})+t x_{\\ell}$ and letting $t \\rightarrow \\infty$. After the deactivation of $\\ell$, in slight abuse of notation, we remove $\\ell$ from the set $V(i)$ and write $K(i)$ for the new corresponding polytope. After consecutively performing all deactivations, we obtain a new configuration, $\\boldsymbol{x}^{\\prime}$, and re-interpret it as a point in $K(i+1)$ by setting $x_{\\ell}=x_{p(\\ell)} /|c(p(\\ell))|$ for each $\\ell \\in \\mathcal{L}(i+1)$. At the end of the deactivation phase, we have $\\boldsymbol{x}^{\\prime} \\in \\arg \\min _{\\boldsymbol{x} \\in K(i+1)} \\Phi_{i}(\\boldsymbol{x})$. The deactivation phase is studied in Section 2.2 where we show the following bound on the associated movement cost:\n\n$$\n\\operatorname{OT}\\left(\\boldsymbol{x}(i), \\boldsymbol{x}^{\\prime}\\right) \\leq \\frac{4}{w}\\left(|\\bar{V}(i+1)|-|\\bar{V}(i)|\\right)+4 \\log w \\cdot\\left(\\Phi_{i}\\left(\\boldsymbol{x}^{\\prime}\\right)-\\Phi_{i}(\\boldsymbol{x}(i))\\right)\n$$\n\n2. Then, we consider the configurations $\\boldsymbol{x}(t)=\\arg \\min _{\\boldsymbol{x} \\in K(i+1)} \\Phi_{i}(\\boldsymbol{x})+t \\sum_{\\ell \\in \\mathcal{L}(i+1)} x_{\\ell} \\log x_{\\ell}$ for values of $t$ that increase continuously from 0 to 1 , which interpolate between $\\boldsymbol{x}^{\\prime}$ and $\\boldsymbol{x}(i+1)$. We call this the growth phase. After the growth, we add a cost of 1 to account for the ultimate movement from $\\mathcal{L}(i)$ to $\\mathcal{L}(i+1)$. The growth phase is studied in Section 2.3, where we will see that\n\n$$\n\\begin{aligned}\n\\operatorname{OT}\\left(\\boldsymbol{x}^{\\prime}, \\boldsymbol{x}(i+1)\\right) & \\leq 2(1+\\log w)^{2}+1 \\\\\n-\\log w & \\leq \\Phi_{i+1}(\\boldsymbol{x}(i+1))-\\Phi_{i}\\left(\\boldsymbol{x}^{\\prime}\\right)\n\\end{aligned}\n$$\n\nCombining the above equations as $(2.5)+(2.6)+4 \\log w \\cdot(2.7)$ yields\n\n$$\n\\operatorname{OT}(\\boldsymbol{x}(i), \\boldsymbol{x}(i+1)) \\leq P(i+1)-P(i)\n$$\n\nwhich in turn implies (2.4) by induction. Inequalities (2.5), (2.6) and (2.7) also help to understand the the design of the potential $P(i)$, which is composed of two terms to control the movement cost in deactivation phases and a term to control the movement cost in growth phases. Having two different terms for deactivation phases stems from two different ways of bounding the cost in these phases, depending on whether $x_{\\ell}$ is larger or smaller than $1 / w^{2}$.\n2.1 Explicit algorithm and basic lemmas In this section, we establish some fundamental properties of the algorithm. The proofs of these lemmas are deferred to Appendix C.\n\nThe following lemma (specialized with $\\gamma=0$ ) explicitly describes the configuration of the algorithm at any given layer by specifying the probabilities $\\frac{x_{u}}{x_{p(u)}}$ of being in the subtree of $u$ conditioned on being in the subtree of $p(u)$.\n\nLemma 2.1. (Explicit algorithm) Let $\\gamma \\in \\mathbb{R}^{\\mathcal{L}}$ and\n\n$$\n\\boldsymbol{x}=\\underset{\\boldsymbol{x} \\in K}{\\arg \\min } \\sum_{u \\in V \\backslash\\{r\\}} h_{p(u)} x_{u} \\log \\frac{x_{u}}{x_{p(u)}}+\\sum_{\\ell \\in \\mathcal{L}} \\gamma_{\\ell} x_{\\ell}\n$$\n\nThen for each $u \\in V \\backslash\\{r\\}$,\n\n$$\n\\frac{x_{u}}{x_{p(u)}}=\\frac{y_{u}}{\\sum_{v \\in c(p(u))} y_{v}}\n$$', images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=7, markdown='where $y_{u}$ is defined recursively via\n\n$$\ny_{u}= \\begin{cases}e^{-\\gamma_{u} / h_{p(u)}} & \\text { if } u \\in \\mathcal{L} \\\\ \\left(\\sum_{v \\in c(u)} y_{v}\\right)^{h_{u} / h_{p(u)}} & \\text { if } u \\notin \\mathcal{L}\\end{cases}\n$$\n\nIn particular, if $\\gamma=0$, then $1 \\leq y_{u} \\leq\\left|\\mathcal{L}_{u}\\right|$.\nAs we will consider continuously evolving configurations, it will be useful to understand their dynamics. These are expressed most naturally by differential equations describing the evolution of the conditional probabilities $\\frac{x_{u}}{x_{p(u)}}$. The premise (2.8) of the following lemma will be satisfied for different functions $\\gamma$ in deactivation and growth steps.\n\nLemma 2.2. (Continuous dynamics) Let $\\gamma:[0, \\infty) \\rightarrow \\mathbb{R}^{\\mathcal{L}}$ and $x:[0, \\infty) \\rightarrow K$ be continuously differentiable and satisfy $x_{u}(t)>0$ for all $u \\in V, t \\in[0, \\infty)$ and\n\n$$\n\\nabla^{2} \\Phi(\\boldsymbol{x}(t)) \\dot{\\boldsymbol{x}}(t)+\\sum_{\\ell \\in \\mathcal{L}} \\gamma_{\\ell}(t) \\boldsymbol{e}_{\\ell} \\in-N_{K}\n$$\n\nfor all $t \\in[0, \\infty)$. Then for each node $u \\in V \\backslash\\{r\\}$,\n\n$$\n\\partial_{t} \\frac{x_{u}}{x_{p(u)}}=\\frac{1}{h_{p(u)} x_{p(u)}}\\left(\\begin{array}{lll}\nx_{u} & \\sum_{v \\\\\nx_{p(u)} & \\left.\\sum_{\\ell \\in \\mathcal{L}_{p(u)}} x_{\\ell} \\gamma_{\\ell}-\\sum_{\\ell \\in \\mathcal{L}_{u}} x_{\\ell} \\gamma_{\\ell}\\right)\n\\end{array}\\right)\n$$\n\nThe next lemma bounds the movement cost in terms of the change of the conditional probabilities.\nLemma 2.3. (Movement cost) The instantaneous cost during continuous movement is at most\n\n$$\n2 \\sum_{u \\in V \\backslash\\{r\\}} h_{p(u)} x_{p(u)}\\left(\\partial_{t} \\frac{x_{u}}{x_{p(u)}}\\right)_{-}\n$$\n\nwhere we write $(z)_{-}=\\max \\{-z, 0\\}$.\n2.2 Deactivation phase This section examines the deactivation of leaves in $\\mathcal{L}(i)$ at a given step $i$. For simplicity, and without loss of generality ${ }^{3}$, we focus on the case where only one leaf $\\ell$ is deactivated and we denote by $d_{\\ell} \\geq 1$ the corresponding number of deactivated nodes, i.e., which had $\\ell$ as their unique descendant in $\\mathcal{L}(i)$. Denoting by $\\boldsymbol{x}^{\\prime}$ the configuration after the deactivations, the goal of this section is to show that\n\n$$\n\\operatorname{OT}\\left(\\boldsymbol{x}(i), \\boldsymbol{x}^{\\prime}\\right) \\leq 4 d_{\\ell} / w+4 \\log w \\cdot\\left(\\Phi_{i}\\left(\\boldsymbol{x}^{\\prime}\\right)-\\Phi_{i}(\\boldsymbol{x}(i))\\right)\n$$\n\nwhere the configuration after deactivations satisfies\n\n$$\n\\boldsymbol{x}^{\\prime}=\\underset{\\boldsymbol{x} \\in K(i) \\text { and } x_{\\ell}=0}{\\arg \\min } \\Phi_{i}(\\boldsymbol{x})\n$$\n\nFor $t \\in[0, \\infty)$, we define $\\boldsymbol{x}(t)$ to interpolate between $\\boldsymbol{x}(i)$ and $\\boldsymbol{x}^{\\prime}$ as follows,\n\n$$\n\\boldsymbol{x}(t)=\\underset{\\boldsymbol{x} \\in K(i)}{\\arg \\min } \\Phi_{i}(\\boldsymbol{x})+t x_{\\ell}\n$$\n\nWe start by stating the following claims on $\\boldsymbol{x}(t)$ :\nLemma 2.4. The following statements are satisfied:\n\n[^0]\n[^0]: ${ }^{3}$ The deactivation of multiple leaves can take place by iterating this method with all elements of $\\mathcal{L}(i) \\backslash V(i+1)$. Enumerating deactivated leaves in some arbitrary order $\\ell_{1}, \\ldots, \\ell_{j}$, one has $|\\bar{V}(i+1)|-|\\bar{V}(i)|=d_{\\ell_{1}}+\\cdots+d_{\\ell_{j}}$.', images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=8, markdown="(a) $\\boldsymbol{x}(t)$ satisfies the explicit form given in Lemma 2.1 with $\\gamma(t)=t \\boldsymbol{e}_{\\ell}$. Therefore, it is differentiable, and $\\boldsymbol{x}(t)$ converges to $\\boldsymbol{x}^{\\prime}$.\n(b) $\\boldsymbol{x}(t)$ satisfies the mirror descent equation $\\nabla^{2} \\Phi_{i}(\\boldsymbol{x}(t)) \\dot{\\boldsymbol{x}}(t) \\in-\\boldsymbol{e}_{\\ell}-N_{K(i)}$ and thus, the dynamics given by Lemma 2.2. Also, $\\partial_{t} D\\left(\\boldsymbol{x}^{\\prime}, \\boldsymbol{x}(t)\\right)=-x_{\\ell}(t)$ and $\\dot{x}_{\\ell}(t) \\leq-\\frac{1}{2 d_{\\ell}} x_{\\ell}(t)$.\n\nProof. We provide a short proof of the above claims.\n(a) Explicit form. $\\boldsymbol{x}(t)$ clearly satisfies the conditions of Lemma 2.1. Since the expression only involves differentiable operations, $\\boldsymbol{x}(t)$ is differentiable. Convergence to $\\boldsymbol{x}^{\\prime}$ holds because the explicit form of $\\boldsymbol{x}(t)$ converges to the explicit form of $\\boldsymbol{x}^{\\prime}$ (which is given by invoking Lemma 2.1 with $\\gamma=0$ and the polytope $K$ where the leaf $\\ell$ is removed).\n(b) Mirror descent. For any $t \\geq 0$, the optimality of $\\boldsymbol{x}(t)$ implies $\\nabla \\Phi_{i}(\\boldsymbol{x}(t))+t \\boldsymbol{e}_{\\ell} \\in-N_{K(i)}$. Taking the derivative with respect to $t$, we obtain, $\\nabla^{2} \\Phi_{i}(\\boldsymbol{x}(t)) \\dot{\\boldsymbol{x}}(t) \\in-\\boldsymbol{e}_{\\ell}-N_{K(i)}$. By definition of the Bregman divergence, $\\partial_{t} D\\left(\\boldsymbol{x}^{\\prime}, \\boldsymbol{x}(t)\\right)=\\dot{\\boldsymbol{x}}(t)^{T} \\nabla^{2} \\Phi_{i}(\\boldsymbol{x}(t))\\left(\\boldsymbol{x}(t)-\\boldsymbol{x}^{\\prime}\\right)=-x_{\\ell}(t)$, where we used that $x_{\\ell}^{\\prime}=0$ and that $\\boldsymbol{x}(t)-\\boldsymbol{x}^{\\prime}$ is orthogonal to $N_{K(i)}$. Then, applying Lemma 2.2 to the lowest node $u$ on the path $\\ell \\rightarrow r$ that has an active sibling, and observing that $\\left(x_{\\ell}(t), \\dot{x}_{\\ell}(t)\\right)=\\left(x_{u}(t), \\dot{x}_{u}(t)\\right)$ and that $h_{p(u)}=d_{\\ell}$, we have $d_{\\ell}\\left(\\frac{\\dot{x}_{\\ell}}{x_{\\ell}}-\\frac{\\dot{x}_{p(u)}}{x_{p(u)}}\\right)=\\frac{x_{\\ell}}{x_{p(u)}}-1$. Then, by Lemma 2.1, we have $\\frac{x_{\\ell}}{x_{p(u)}} \\leq 1 / 2$ and $\\dot{x}_{p(u)} \\leq 0$, since all conditional probabilities on $\\ell \\rightarrow r$ are non-increasing. Therefore, $d_{\\ell} \\frac{x_{\\ell}}{x_{\\ell}} \\leq-1 / 2$.\n\nUsing claim (b) above, we note that\n\n$$\n\\int_{t=0}^{\\infty} x_{\\ell}(t) d t=-\\int_{t=0}^{\\infty} \\partial_{t} D\\left(\\boldsymbol{x}^{\\prime}, \\boldsymbol{x}(t)\\right) d t=D\\left(\\boldsymbol{x}^{\\prime}, \\boldsymbol{x}(i)\\right)=\\Phi_{i}\\left(\\boldsymbol{x}^{\\prime}\\right)-\\Phi_{i}(\\boldsymbol{x}(i))\n$$\n\nwhere the last equation uses that $\\nabla \\Phi_{i}(\\boldsymbol{x}(i)) \\in-N_{K(i)}$ (by optimality of $\\boldsymbol{x}(i)$ ) is orthogonal to $\\boldsymbol{x}^{\\prime}-\\boldsymbol{x}(i)$.\nFor the proof deactivation step (2.5'), in light of (2.9), it suffices to prove that\n\n$$\n\\operatorname{OT}\\left(\\boldsymbol{x}(i), \\boldsymbol{x}^{\\prime}\\right) \\leq 4 d_{\\ell} / w+4 \\log w \\int_{t=0}^{\\infty} x_{\\ell}(t) d t\n$$\n\nSince $x_{\\ell}(t)$ is decreasing during the deletion of $\\ell$, if $x_{\\ell}(i)>1 / w^{2}$, there exists an instant $\\bar{t}>0$ such that $x_{\\ell}(t) \\geq 1 / w^{2}$ for $t \\leq \\bar{t}$ and $x_{\\ell}(t) \\leq 1 / w^{2}$ for $t \\geq \\bar{t}$. If $x_{\\ell}(i) \\leq 1 / w^{2}$, we simply set $\\bar{t}=0$. By the triangle inequality, $\\operatorname{OT}\\left(\\boldsymbol{x}(i), \\boldsymbol{x}^{\\prime}\\right) \\leq \\int_{t=0}^{\\infty} \\partial_{t} \\operatorname{Cost}(t) d t$. To achieve the above goal (2.10), it will thus suffice to prove the two following equations:\n\n$$\n\\begin{aligned}\n& \\int_{t=0}^{\\bar{t}} \\partial_{t} \\operatorname{Cost}(t) d t \\leq 4 \\log w \\int_{t=0}^{\\bar{t}} x_{\\ell}(t) d t \\\\\n& \\int_{\\bar{t}}^{\\infty} \\partial_{t} \\operatorname{Cost}(t) d t \\leq 4 d_{\\ell} / w\n\\end{aligned}\n$$\n\nApplying Lemma 2.3 and Lemma 2.2 with $\\gamma=\\boldsymbol{e}_{\\ell}$, the instantaneous cost during deactivation of leaf $\\ell$ satisfies\n\n$$\n\\partial_{t} \\operatorname{Cost}(t) \\leq 2 x_{\\ell} \\sum_{u \\in \\ell \\rightarrow r}\\left(1-\\frac{x_{u}}{x_{p(u)}}\\right)\n$$\n\nIn particular, this shows that $\\partial_{t} \\operatorname{Cost}(t) \\leq 2 w x_{\\ell}$, because $w$ bounds the number of non-zero terms in the above sum. Together with the claim (b) that $x_{\\ell} \\leq-2 d_{\\ell} \\dot{x}_{\\ell}$, we get that\n\n$$\n\\int_{\\bar{t}}^{\\infty} \\partial_{t} \\operatorname{Cost}(t) d t \\leq-4 d_{\\ell} w \\int_{t=\\bar{t}}^{\\infty} \\dot{x}_{\\ell}(t) d t=4 d_{\\ell} w x_{\\ell}(\\bar{t}) \\leq 4 d_{\\ell} / w\n$$\n\nwhich proves inequality (2.12).\nTo conclude, we now need to prove (2.11). Since $1-z \\leq-\\log z$ for all $z>0$,\n\n$$\n\\sum_{u \\in \\ell \\rightarrow r}\\left(1-\\frac{x_{u}}{x_{p(u)}}\\right) \\leq-\\log \\left(\\prod_{u \\in \\ell \\rightarrow r} \\frac{x_{u}}{x_{p(u)}}\\right)=-\\log x_{\\ell}\n$$", images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=9, markdown='For $t<\\bar{t}$, since $x_{\\ell}(t) \\geq 1 / w^{2}$, we can further bound this by $2 \\log w$. In turn, using (2.13), we get that\n\n$$\n\\partial_{t} \\operatorname{Cost}(t) \\leq 4 \\log w \\cdot x_{\\ell}\n$$\n\nwhich completes the proof of inequality (2.11).\n2.3 Growth phase In this section, we will account for the growth of the tree by interpolating continuously between the configuration defined after all deactivations, $\\boldsymbol{x}^{\\prime}$, and the configuration defined at the next layer $\\boldsymbol{x}(i+1)$. We consider the following trajectory, where we now have for $t \\in(0,1]$,\n\n$$\n\\boldsymbol{x}(t)=\\underset{\\boldsymbol{x} \\in K(i+1)}{\\arg \\min } \\Phi_{t}(\\boldsymbol{x}) \\quad \\text { where } \\quad \\Phi_{t}(\\boldsymbol{x})=\\Phi_{i}(\\boldsymbol{x})+t \\sum_{\\ell \\in \\mathcal{L}(i+1)} x_{\\ell} \\log x_{\\ell}\n$$\n\nTaking the derivative with respect to $t$ in $\\Phi_{t}(\\boldsymbol{x}(t))$ yields, by the optimality of $\\boldsymbol{x}(t)$, that $\\partial_{t} \\Phi_{t}(\\boldsymbol{x}(t))=$ $\\sum_{\\ell \\in \\mathcal{L}(i+1)} x_{\\ell} \\log x_{\\ell} \\geq-\\log w$. Hence, $\\Phi_{i+1}(\\boldsymbol{x}(i+1))-\\Phi_{i}\\left(\\boldsymbol{x}^{\\prime}\\right)=\\int_{0}^{t} \\partial_{t} \\Phi_{t}(\\boldsymbol{x}(t)) d t \\geq-\\log w$ yields inequality (2.7). To complete the analysis of the growth step we must therefore prove inequality (2.6) stating that\n\n$$\n\\operatorname{OT}\\left(\\boldsymbol{x}^{\\prime}, \\boldsymbol{x}(i+1)\\right) \\leq 2(1+\\log w)^{2}+1\n$$\n\nThough the configuration $\\boldsymbol{x}(t)$ is a member of $K(i+1)$, we will not pay for movements between layer $i$ and layer $i+1$ until the end of the growth phase. ${ }^{4}$ This is because we can effectively move the probability mass between points of $\\mathcal{L}(i)$, and then move the mass from $\\mathcal{L}(i)$ to $\\mathcal{L}(i+1)$ at the very end of the elongation. This final movement is paid by the +1 term in inequality (2.6). Thus, to preserve (2.6), it suffices to show that the instantaneous movement cost, for $t \\in[0,1]$, is bounded as\n\n$$\n\\partial_{t} \\operatorname{Cost}(t) \\leq 2(1+\\log w)^{2}\n$$\n\nDifferentiating the optimality condition $\\nabla \\Phi_{t}(\\boldsymbol{x}(t)) \\in-N_{K(i+1)}$, we get that $\\boldsymbol{x}(t)$ satisfies $\\nabla^{2} \\Phi_{t}(\\boldsymbol{x}) \\dot{\\boldsymbol{x}}+$ $\\sum_{\\ell \\in \\mathcal{L}(i+1)}\\left(1+\\log x_{\\ell}\\right) \\boldsymbol{e}_{\\ell} \\in-N_{K(i+1)}$. Since $\\sum_{\\ell \\in \\mathcal{L}(i+1)} \\boldsymbol{e}_{\\ell} \\in N_{K(i+1)}$, we can rewrite this expression as\n\n$$\n\\nabla^{2} \\Phi_{t}(\\boldsymbol{x}) \\dot{\\boldsymbol{x}}+\\sum_{\\ell \\in \\mathcal{L}(i+1)} \\log x_{\\ell} \\cdot \\boldsymbol{e}_{\\ell} \\in-N_{K(i+1)}\n$$\n\nIn light of Lemma 2.2 we have for all $u \\in V(i)$,\n\n$$\n\\partial_{t} \\frac{x_{u}}{x_{p(u)}}=\\frac{1}{h_{p(u)} x_{p(u)}}\\left(\\frac{x_{u}}{x_{p(u)}} \\sum_{\\ell \\in \\mathcal{L}_{p(u)}(i+1)} x_{\\ell} \\log x_{\\ell}-\\sum_{\\ell \\in \\mathcal{L}_{u}(i+1)} x_{\\ell} \\log x_{\\ell}\\right)\n$$\n\nwhere we define $h_{p(u)}=h_{p(u)}(i)+t$. By Lemma 2.3, we recall that the continuous movement is bounded by\n\n$$\n\\begin{aligned}\n\\partial_{t} \\operatorname{Cost}(t) & \\leq 2 \\sum_{u \\in V(i) \\backslash\\{r\\}} h_{p(u)} x_{p(u)}\\left(\\partial_{t} \\frac{x_{u}}{x_{p(u)}}\\right)_{-} \\\\\n& =2 \\sum_{u \\in V(i) \\backslash\\{r\\}}\\left(\\frac{x_{u}}{x_{p(u)}} \\sum_{\\ell \\in \\mathcal{L}_{p(u)}(i+1)} x_{\\ell} \\log x_{\\ell}-\\sum_{\\ell \\in \\mathcal{L}_{u}(i+1)} x_{\\ell} \\log x_{\\ell}\\right)_{-}\n\\end{aligned}\n$$\n\nFor a node $p \\in V(i) \\backslash \\mathcal{L}(i)$, let $c^{-}(p) \\subset c(p)$ denote the set of children $u$ of $p$ that have a non-zero contribution to this sum, i.e., satisfying $\\frac{x_{u}}{x_{p}} \\sum_{\\ell \\in \\mathcal{L}_{p}} x_{\\ell} \\log x_{\\ell}-\\sum_{\\ell \\in \\mathcal{L}_{u}} x_{\\ell} \\log x_{\\ell}<0$, and let $c^{+}(p)=c(p) \\backslash c^{-}(p)$. Let $\\mathcal{L}_{p}^{+}=\\bigcup_{u \\in c^{+}(p)} \\mathcal{L}_{u}(i+1), w_{p}^{+}=\\left|\\mathcal{L}_{p}^{+}\\right|$and $x_{p}^{+}=\\sum_{u \\in c^{+}(p)} x_{u}=\\sum_{\\ell \\in \\mathcal{L}_{p}^{+}} x_{\\ell}$, and define $\\mathcal{L}_{p}^{-}, w_{p}^{-}$and $x_{p}^{-}$similarly,\n\n[^0]\n[^0]: ${ }^{4}$ More precisely, we initially pay only for the movement costs associated to the projection of $\\boldsymbol{x}(t)$ onto $K(i)$.', images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=10, markdown='so that $x_{p}^{+}+x_{p}^{-}=x_{p}$ and $\\mathcal{L}_{p}^{+} \\cup \\mathcal{L}_{p}^{-}=\\mathcal{L}_{p}(i+1)$. Then\n\n$$\n\\begin{aligned}\n\\sum_{u \\in c(p)}\\left(\\frac{x_{u}}{x_{p}} \\sum_{\\ell \\in \\mathcal{L}_{p}(i+1)} x_{\\ell} \\log x_{\\ell}-\\sum_{\\ell \\in \\mathcal{L}_{u}(i+1)} x_{\\ell} \\log x_{\\ell}\\right)- & =\\sum_{\\ell \\in \\mathcal{L}_{p}^{-}} x_{\\ell} \\log x_{\\ell}-\\frac{x_{p}^{-}}{x_{p}} \\sum_{\\ell \\in \\mathcal{L}_{p}(i+1)} x_{\\ell} \\log x_{\\ell} \\\\\n& =\\frac{x_{p}^{+}}{x_{p}} \\sum_{\\ell \\in \\mathcal{L}_{p}^{-}} x_{\\ell} \\log x_{\\ell}-\\frac{x_{p}^{-}}{x_{p}} \\sum_{\\ell \\in \\mathcal{L}_{p}^{+}} x_{\\ell} \\log x_{\\ell} \\\\\n& \\leq \\frac{x_{p}^{+} x_{p}^{-}}{x_{p}} \\log x_{p}^{-}-\\frac{x_{p}^{-} x_{p}^{+}}{x_{p}} \\log \\frac{x_{p}^{+}}{w_{p}^{+}} \\\\\n& \\leq \\frac{x_{p}^{+} x_{p}^{-}}{x_{p}} \\log \\left(w_{p}^{+} w_{p}^{-}\\right)\n\\end{aligned}\n$$\n\nwhere the first inequality uses that $\\sum x_{\\ell} \\log x_{\\ell}$ subject to the constraints $x \\geq 0$ and $\\sum x_{\\ell}=x_{p}^{-}$(resp. $\\sum x_{\\ell}=x_{p}^{+}$) is maximized (resp. minimized) when all mass is concentrated on a single leaf (resp. spread equally across the $w_{p}^{+}$leaves), and the last inequality follows from the (yet unproven) relation\n\n$$\nx_{p}^{-} \\leq x_{p}^{+} w_{p}^{-}\n$$\n\nTo see that (2.17) is true, note that by Lemma 2.1, there exist $1 \\leq y_{u} \\leq\\left|\\mathcal{L}_{u}\\right|$ such that\n\n$$\n\\frac{x_{p}^{+}}{x_{p}}=\\frac{\\sum_{u \\in c^{+}(p)} y_{u}}{\\sum_{u \\in c(p)} y_{u}} \\geq \\frac{1}{1+\\sum_{u \\in c^{-}(p)}\\left|\\mathcal{L}_{u}\\right|}=\\frac{1}{1+w_{p}^{-}}\n$$\n\nApplying this twice, we get\n\n$$\n\\frac{x_{p}^{+} w_{p}^{-}}{x_{p}} \\geq \\frac{w_{p}^{-}}{1+w_{p}^{-}}=1-\\frac{1}{1+w_{p}^{-}} \\geq 1-\\frac{x_{p}^{+}}{x_{p}}=\\frac{x_{p}^{-}}{x_{p}}\n$$\n\nand multiplying by $x_{p}$ yields (2.17).\nCombining (2.15) and (2.16) yields\n\n$$\n\\partial_{t} \\operatorname{Cost}(t) \\leq 2 \\sum_{p \\in V(i) \\backslash \\mathcal{L}(i)} \\frac{x_{p}^{+} x_{p}^{-}}{x_{p}} \\log \\left(w_{p}^{+} w_{p}^{-}\\right)\n$$\n\nFinally, our objective (2.14) is implied by invoking the following lemma with $u=r$.\nLemma 2.5. For each $u \\in V(i)$, writing $V_{u} \\subseteq V(i)$ for the nodes in the subtree rooted at $u$ and $w_{u}=\\left|\\mathcal{L}_{u}(i+1)\\right|$, it holds that\n\n$$\n\\sum_{p \\in V_{u} \\backslash \\mathcal{L}(i)} \\frac{x_{p}^{+} x_{p}^{-}}{x_{p}} \\log \\left(w_{p}^{+} w_{p}^{-}\\right) \\leq x_{u}\\left(1+\\log w_{u}\\right)^{2}\n$$\n\nProof. We proceed by induction on the level of $u$. If $u \\in \\mathcal{L}(i)$, the inequality follows trivially since the left-hand side is an empty sum.\n\nFor the induction step, consider some $u \\in V(i) \\backslash \\mathcal{L}(i)$. Then\n\n$$\n\\begin{aligned}\n\\sum_{p \\in V_{u} \\backslash \\mathcal{L}(i)} \\frac{x_{p}^{+} x_{p}^{-}}{x_{p}} \\log \\left(w_{p}^{+} w_{p}^{-}\\right) & =\\frac{x_{u}^{+} x_{u}^{-}}{x_{u}} \\log \\left(w_{u}^{+} w_{u}^{-}\\right)+\\sum_{v \\in c(u)} \\sum_{p \\in V_{v} \\backslash \\mathcal{L}(i)} \\frac{x_{p}^{+} x_{p}^{-}}{x_{p}} \\log \\left(w_{p}^{+} w_{p}^{-}\\right) \\\\\n& \\leq \\frac{x_{u}^{+} x_{u}^{-}}{x_{u}} \\log \\left(w_{u}^{+} w_{u}^{-}\\right)+\\sum_{v \\in c(u)} x_{v}\\left(1+\\log w_{v}\\right)^{2} \\\\\n& \\leq \\frac{x_{u}^{+} x_{u}^{-}}{x_{u}} \\log \\left(w_{u}^{+} w_{u}^{-}\\right)+x_{u}^{+}\\left(1+\\log w_{u}^{+}\\right)^{2}+x_{u}^{-}\\left(1+\\log w_{u}^{-}\\right)^{2}\n\\end{aligned}\n$$', images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=11, markdown='where the first inequality follows from the induction hypothesis. We may assume $w_{u}^{-} \\geq 1$ and $w_{u}^{+} \\geq 1$, since otherwise $x_{u}^{+}$or $x_{u}^{-}$is 0 and the lemma follows directly from the induction hypothesis applied to the children of $u$. Dividing by $x_{u}$ and writing $q:=\\frac{x_{u}^{+}}{x_{u}}$, so that $1-q=\\frac{x_{u}^{-}}{x_{u}}$, it remains to show that\n\n$$\nq(1-q) \\log \\left(w_{u}^{+} w_{u}^{-}\\right)+q\\left(1+\\log w_{u}^{+}\\right)^{2}+(1-q)\\left(1+\\log w_{u}^{-}\\right)^{2} \\leq\\left(1+\\log w_{u}\\right)^{2}\n$$\n\nfor all $q \\in[0,1]$.\nBy concavity of $w \\mapsto(1+\\log w)^{2}$ for $w \\geq 1$, we have\n\n$$\nq\\left(1+\\log w_{u}^{+}\\right)^{2}+(1-q)\\left(1+\\log w_{u}^{-}\\right)^{2} \\leq\\left(1+\\log \\left(q w_{u}^{+}+(1-q) w_{u}^{-}\\right)\\right)^{2}\n$$\n\nBy symmetry, assume $w_{u}^{+} \\geq w_{u}^{-}$, so we can write $w_{u}^{+}=(1+a) w_{u}^{-}$for some $a \\geq 0$. Using $w_{u}=w_{u}^{+}+w_{u}^{-}$, and writing $d:=e w_{u}^{-}$(where $e$ is the base of $\\log$ ), inequality (2.18) follows if we show that\n\n$$\nq(1-q) \\log \\left((1+a) d^{2}\\right)+\\log ^{2}((1+a q) d) \\leq \\log ^{2}((2+a) d)\n$$\n\nfor all $q \\in[0,1], a \\geq 0$ and $d \\geq e$. Indeed,\n\n$$\n\\begin{aligned}\n& q(1-q) \\log \\left((1+a) d^{2}\\right)+\\log ^{2}((1+a q) d)-\\log ^{2}((2+a) d) \\\\\n= & q(1-q)(2 \\log d+\\log (1+a))+2 \\log d \\cdot \\log \\frac{1+a q}{2+a}+\\log ^{2}(1+a q)-\\log ^{2}(2+a) \\\\\n\\leq & \\log (2+a) \\log \\frac{1+a}{2+a}+\\log (1+a q) \\log \\frac{1+a q}{1+a} \\\\\n\\leq & 0\n\\end{aligned}\n$$\n\nwhere inequality (2.19) uses\n\n$$\nq(1-q) \\leq \\min \\left\\{\\log 2, \\log \\frac{1}{q}\\right\\} \\leq \\log \\frac{2+a}{1+a q}\n$$\n\nAcknowledgements. The authors thank the anonymous reviewers for their helpful comments. RC thanks Laurent Massoulié for insightful discussions.\n\n# References \n\n[ACE +23a] Antonios Antoniadis, Christian Coester, Marek Eliás, Adam Polak, and Bertrand Simon. Mixing predictions for online metric algorithms. In International Conference on Machine Learning, ICML, 2023.\n[ACE +23b] Antonios Antoniadis, Christian Coester, Marek Eliás, Adam Polak, and Bertrand Simon. Online metric algorithms with untrusted predictions. ACM Trans. Algorithms, 19(2), 2023.\n[AGTG21] CJ Argue, Anupam Gupta, Ziye Tang, and Guru Guruganesh. Chasing convex bodies with linear competitive ratio. Journal of the ACM (JACM), 68(5), 2021.\n[AKL +79] Romas Aleliunas, Richard M Karp, Richard J Lipton, László Lovász, and Charles Rackoff. Random walks, universal traversal sequences, and the complexity of maze problems. In 20th Annual Symposium on Foundations of Computer Science (sfcs 1979), 1979.\n[BC22] Nikhil Bansal and Christian Coester. Online metric allocation and time-varying regularization. In 30th Annual European Symposium on Algorithms, ESA, 2022.\n[BCAGL23] Siddhartha Banerjee, Vincent Cohen-Addad, Anupam Gupta, and Zhouzi Li. Graph searching with predictions. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), 2023.\n[BCL +18 ] Sébastien Bubeck, Michael B. Cohen, Yin Tat Lee, James R. Lee, and Aleksander Madry. k-server via multiscale entropic regularization. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC, 2018.\n[BCLL21] Sébastien Bubeck, Michael B. Cohen, James R. Lee, and Yin Tat Lee. Metrical task systems on trees via mirror descent and unfair gluing. SIAM J. Comput., 50(3), 2021.', images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=12, markdown='[BCN14] Niv Buchbinder, Shahar Chen, and Joseph Naor. Competitive analysis via regularization. In Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, 2014.\n[BCR22] Sébastien Bubeck, Christian Coester, and Yuval Rabani. Shortest paths without a map, but with an entropic regularizer. In 63rd Annual Symposium on Foundations of Computer Science, FOCS, 2022.\n[BCR23] Sébastien Bubeck, Christian Coester, and Yuval Rabani. The randomized k-server conjecture is false! In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, STOC, 2023.\n[BDHS23] Júlia Baligács, Yann Disser, Irene Heinrich, and Pascal Schweitzer. Exploration of graphs with excluded minors. In 31st Annual European Symposium on Algorithms (ESA), 2023.\n[Bel63] Richard Bellman. An optimal search. Siam Review, 5(3), 1963.\n[BGMN19] Niv Buchbinder, Anupam Gupta, Marco Molinaro, and Joseph (Seffi) Naor. k-servers with a smile: Online algorithms via projections. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, 2019.\n[BLLS19] Sébastien Bubeck, Yin Tat Lee, Yuanzhi Li, and Mark Sellke. Competitively chasing convex bodies. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC, 2019.\n[BLS92] Allan Borodin, Nathan Linial, and Michael E Saks. An optimal on-line algorithm for metrical task system. Journal of the ACM (JACM), 39(4), 1992.\n[Bur96] William R Burley. Traversing layered graphs using the work function algorithm. Journal of Algorithms, 20(3), 1996.\n[BYCR93] Ricardo A Baeza-Yates, Joseph C Culberson, and Gregory JE Rawlins. Searching in the plane. Information and computation, 106(2), 1993.\n[CK19] Christian Coester and Elias Koutsoupias. The online $k$-taxi problem. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC, 2019.\n[CL93] Marek Chrobak and Lawrence L Larmore. Metrical Service System: Deterministic Strategies. Citeseer, 1993.\n[CL22] Christian Coester and James R. Lee. Pure entropic regularization for metrical task systems. Theory Comput., 18, 2022.\n[CM24] Romain Cosson and Laurent Massoulié. Collective tree exploration via potential function method. In 15th Innovations in Theoretical Computer Science Conference (ITCS 2024), 2024.\n[CMV23] Romain Cosson, Laurent Massoulié, and Laurent Viennot. Efficient collaborative tree exploration with breadthfirst depth-next. In 37th International Symposium on Distributed Computing (DISC 2023). Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2023.\n[Cos24] Romain Cosson. Ariadne and theseus: Exploration and rendezvous with two mobile agents in an unknown graph. arXiv preprint arXiv:2403.07748, 2024.\n[FFK ${ }^{+} 98$ Amos Fiat, Dean P Foster, Howard Karloff, Yuval Rabani, Yiftach Ravid, and Sundar Vishwanathan. Competitive algorithms for layered graph traversal. SIAM Journal on Computing, 28(2), 1998.\n[FGKP06] Pierre Fraigniaud, Leszek Gasieniec, Dariusz R. Kowalski, and Andrzej Pelc. Collective tree exploration. Networks, 48(3), 2006.\n[KP94] Bala Kalyanasundaram and Kirk R Pruhs. Constructing competitive tours from local information. Theoretical Computer Science, 130(1), 1994.\n[Luc82] Édouard Lucas. Récréations mathématiques, volume 1. Gauthier-Villars, 1882.\n[MMS12] Nicole Megow, Kurt Mehlhorn, and Pascal Schweitzer. Online graph exploration: New results on old and new algorithms. Theoretical Computer Science, 463, 2012.\n[PY91] Christos H Papadimitriou and Mihalis Yannakakis. Shortest paths without a map. Theoretical Computer Science, 84(1), 1991.\n[Ram95] Hariharan Ramesh. On traversing layered graphs on-line. J. Algorithms, 18(3), 1995.\n[Sel20] Mark Sellke. Chasing convex bodies optimally. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA, 2020.\n[Tar95] Gaston Tarry. Le probleme des labyrinthes. Nouvelles annales de mathématiques: journal des candidats aux écoles polytechnique et normale, 14, 1895.\n[YWB03] Vladimir Yanovski, Israel A Wagner, and Alfred M Bruckstein. A distributed ant algorithm for efficiently patrolling a network. Algorithmica, 37, 2003.', images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=13, markdown='# A Simple results for unweighted layered graph traversal \n\nThis section discusses a few simple algorithms and bounds for traversing unweighted layered graphs. First, we give a tight bound on deterministic algorithms; then, we give a randomized lower bound and discuss two elementary randomized algorithms that fail to provide substantial competitive guarantees.\n\nProposition A.1. (Deterministic Tight Bound) For every $w \\in \\mathbb{N}$, the competitive ratio of deterministic unweighted layered graph traversal with width $w$ is exactly $2 w-1$.\n\nThe upper bound is achieved by a simple depth-first search. This algorithm uses the edges on the path to the target once, and other edges at most twice. Since the number of edges traversed is at most $w$ per layer, the competitive ratio is $2 w-1$.\n\nFor the lower bound, consider a star-shaped graph $G_{\\text {star }}$ comprising a root $r$ and $w$ chains starting from $r$, as illustrated by Figure 2a. A deterministic algorithm must sequentially explore each chain\'s endpoint to find the chain with depth $t$. For any $\\mathcal{A}$ and any depth $i$, we assign length $i-w+j$ to the $j$-th chain $\\mathcal{A}$ would explore. This graph instance would make $\\mathcal{A}$ traverse all $w$ chains, incurring a total cost of $(2 w-1) i-O\\left(w^{2}\\right)$. Letting $i \\rightarrow \\infty$ shows that no deterministic algorithm can have a competitive ratio less than $2 w-1$.\n![img-1.jpeg](img-1.jpeg)\n\nFigure 2: Three Instances of Unweighted Layered Graph.\n\nProposition A.2. (Randomized Lower Bound) Any randomized algorithm for unweighted layered graph traversal has a competitive ratio of $\\Omega(\\log w)$.\n\nThe proof is analogous to $\\left[\\mathrm{FFK}^{+} 98\\right.$, Theorem 12]. For any randomized algorithm, we consider its fractional counterpart $\\mathcal{A}$. We apply $\\mathcal{A}$ on the star-shaped graph $G_{\\text {star }}$ as in Figure 2a. Initially, each chain has length $i-w+1$ for sufficiently large $i$. At layer $i-w+1$, at least one endpoint is occupied with probability at least $\\frac{1}{w}$. We give that node no children in the next layer and extend the length of the remaining chains. We repeat this process on layers $i-w+2$ to $i$. Then, the expected cost incurred by $\\mathcal{A}$ is $\\Omega(i)$ times the sum, over each chain, of the probability that $\\mathcal{A}$ visits its endpoint, which is $\\sum_{j=1}^{w} \\frac{1}{j}=\\Omega(\\log w)$.\n\nTo find efficient traversal algorithms, we examined various elementary ideas. Among them, none have demonstrated a competitive ratio below $\\Omega(w)$, failing to improve on the deterministic DFS.\n\nRandom DFS Algorithm. Perhaps the most immediate (randomized) generalization of depth-first search is to consider the algorithm where the agent chooses the branch that it will explore next uniformly at random among the closest active branches, instead of always choosing one branch deterministically, which can be exploited by the adversary (as we did in the study of deterministic depth-first search). It is easy to see that the fractional perspective on this algorithm actually corresponds at all times to the configuration $\\boldsymbol{x}(i)$ that is defined by induction by $x_{r}=1$ and $\\forall u \\in V \\backslash\\{r\\}, x_{u}=x_{p(u)} /|c(p(u))|$ where $|c(p(u))|$ denotes the number of active children of $u$ at round $i$. We note that this method is $\\Omega(w)$ competitive when applied to the "Comb" graph illustrated in Figure 2c. In this graph, each left-extending chain from the rightmost branch has length $w$, and the algorithm explores each dead end with a probability of one-half. Consequently, the total expected cost for this algorithm is $\\Omega(w i)$ at the $i$-th layer.', images=[OCRImageObject(id='img-1.jpeg', top_left_x=239, top_left_y=752, bottom_right_x=1494, bottom_right_y=1196, image_base64=None)], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=14, markdown='A slightly more elaborate lower bound shows that the algorithm that moves, upon reaching a dead-end, to a random node that is at most some constant factor $L$ (possibly depending on $w$ ) further away than the closest active node in the new layer, also has competitive ratio $\\Omega(w)$.\n\nUniform Algorithm. The problem with the random depth-first search algorithm seems to be that it lets too much probability mass concentrate on a single leaf, and that the adversary is then encouraged to delete that precise leaf. One intuitive idea to overcome this issue is to consider the algorithm that maintains a uniform probability distribution on the active leaves instead (moving from one layer to the next using an optimal transport coupling). This algorithm fails in the case of a two-branch tree, as depicted in Figure 2b. The design of this tree causes the ultimate layer to alternate between having $\\{2,1\\}$ and $\\{1,2\\}$ nodes in both of its branches, incurring a movement cost of $\\Omega(i)$ for each step due to the constant probability of switching branches. Hence, this algorithm has unbounded competitive ratio even for $w=3$.\n\n# B Equivalent expressions for $\\Phi$ \n\nThe regularizer has two possible expressions, we give a detailed derivation here.\n\n$$\n\\begin{aligned}\n\\Phi(\\boldsymbol{x})=\\sum_{u \\in V \\backslash\\{r\\}} h_{p(u)} x_{u} \\log \\frac{x_{u}}{x_{p(u)}} & =\\sum_{u \\in V \\backslash\\{r\\}} h_{p(u)} x_{u} \\log x_{u}-\\sum_{u \\in V \\backslash\\{r\\}} h_{p(u)} x_{u} \\log x_{p(u)} \\\\\n& =\\sum_{u \\in V \\backslash\\{r\\}} h_{p(u)} x_{u} \\log x_{u}-\\sum_{u \\in V \\backslash \\mathcal{L}} h_{u} x_{u} \\log x_{u} \\\\\n& =\\sum_{u \\in V \\backslash\\{r\\}}\\left(h_{p(u)}-h_{u}\\right) x_{u} \\log x_{u} \\\\\n& =\\sum_{u \\in V} x_{u} \\log x_{u}\n\\end{aligned}\n$$\n\nwhere we used $\\log x_{r}=\\log 1=0$ and $h_{\\ell}=0$ for $\\ell \\in \\mathcal{L}$.\n\n## C Proof of the explicit algorithm and basic lemmas\n\nIn this section, we restate and prove the explicit algorithm (Lemma 2.1) and two fundamental lemmas regarding the continuous dynamics and movement cost (Lemma 2.2 and 2.3) of evolving configurations from Section 2.1.\n\nLemma C.1. (Explicit algorithm) Let $\\gamma \\in \\mathbb{R}^{\\mathcal{L}}$ and\n\n$$\n\\boldsymbol{x}=\\underset{\\boldsymbol{x} \\in K}{\\arg \\min } \\sum_{u \\in V \\backslash\\{r\\}} h_{p(u)} x_{u} \\log \\frac{x_{u}}{x_{p(u)}}+\\sum_{\\ell \\in \\mathcal{L}} \\gamma_{\\ell} x_{\\ell}\n$$\n\nThen for each $u \\in V \\backslash\\{r\\}$,\n\n$$\n\\frac{x_{u}}{x_{p(u)}}=\\frac{y_{u}}{\\sum_{v \\in c(p(u))} y_{v}}\n$$\n\nwhere $y_{u}$ is defined recursively via\n\n$$\ny_{u}= \\begin{cases}e^{-\\gamma_{u} / h_{p(u)}} & \\text { if } u \\in \\mathcal{L} \\\\ \\sum_{v \\in c(u)} y_{v} & \\text { if } u \\notin \\mathcal{L}\\end{cases}\n$$\n\nIn particular, if $\\gamma=0$, then $1 \\leq y_{u} \\leq\\left|\\mathcal{L}_{u}\\right|$.\nProof. Let\n\n$$\nf(\\boldsymbol{x})=\\sum_{u \\in V \\backslash\\{r\\}} h_{p(u)} x_{u} \\log \\frac{x_{u}}{x_{p(u)}}+\\sum_{\\ell \\in \\mathcal{L}} \\gamma_{\\ell} x_{\\ell}\n$$', images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=15, markdown='Since $f$ is strictly convex, $\\boldsymbol{x}$ is the unique point in $K$ satisfying the $\\nabla f(\\boldsymbol{x}) \\in-N_{K}(\\boldsymbol{x})$. Thus, it suffices to show that the point $\\boldsymbol{x}$ defined via the ratios (C.1) satisfies these conditions.\n\nClearly, it satisfies $\\boldsymbol{x} \\in K$. Moreover,\n\n$$\n\\begin{aligned}\n\\partial_{x_{u}} f(\\boldsymbol{x}) & =h_{p(u)}\\left(1+\\log \\frac{x_{u}}{x_{p(u)}}\\right)-h_{u} \\sum_{v \\in c(u)} \\frac{x_{v}}{x_{u}}+\\gamma_{u} \\mathbb{1}(u \\in \\mathcal{L}) \\\\\n& =h_{p(u)}\\left(1+\\log y_{u}-\\log \\sum_{v \\in c(p(u))} y_{v}\\right)-h_{u}+\\gamma_{u} \\mathbb{1}(u \\in \\mathcal{L}) \\\\\n& =h_{p(u)}+h_{p(u)} \\log y_{u}-h_{p(p(u))} \\log y_{p(u)}-h_{u}+\\gamma_{u} \\mathbb{1}(u \\in \\mathcal{L})\n\\end{aligned}\n$$\n\nThus, letting $\\lambda_{u}=h_{u}-h_{p(u)} \\log y_{u}$ for $u \\in V \\backslash \\mathcal{L}$ and $\\lambda_{\\ell}=0=-\\gamma_{\\ell}+h_{\\ell}-h_{p(\\ell)} \\log y_{\\ell}$ for $\\ell \\in \\mathcal{L}$ (using $h_{\\ell}=0$ ), we have\n\n$$\n\\nabla f(x)=\\sum_{u \\in V \\backslash\\{r\\}}\\left(\\lambda_{p(u)}-\\lambda_{u}\\right) \\boldsymbol{e}_{u} \\in-N_{K}(x)\n$$\n\nIf $\\gamma=0$, the inequalities that pertain to $y_{u}$ follow from the recursive definition by induction.\nLemma C.2. (Continuous dynamics) Let $\\gamma:[0, \\infty) \\rightarrow \\mathbb{R}^{\\mathcal{L}}$ and $x:[0, \\infty) \\rightarrow K$ be continuously differentiable and satisfy $x_{u}(t)>0$ for all $u \\in V, t \\in[0, \\infty)$ and\n\n$$\n\\nabla^{2} \\Phi(\\boldsymbol{x}(t)) \\dot{\\boldsymbol{x}}(t)+\\sum_{\\ell \\in \\mathcal{L}} \\gamma_{\\ell}(t) \\boldsymbol{e}_{\\ell} \\in-N_{K}\n$$\n\nfor all $t \\in[0, \\infty)$. Then for each node $u \\in V \\backslash\\{r\\}$,\n\n$$\n\\partial_{u} \\frac{x_{u}}{x_{p(u)}}=\\frac{1}{h_{p(u)} x_{p(u)}}\\left(\\frac{x_{u}}{x_{p(u)}} \\sum_{\\ell \\in \\mathcal{L}_{p(u)}} x_{\\ell} \\gamma_{\\ell} \\quad-\\quad \\sum_{\\ell \\in \\mathcal{L}_{u}} x_{\\ell} \\gamma_{\\ell}\\right)\n$$\n\nProof. The first and second order derivatives of $\\Phi(\\boldsymbol{x})=\\sum_{u \\in V \\backslash\\{r\\}} h_{p(u)} x_{u} \\log \\frac{x_{u}}{x_{p(u)}}$ are given ${ }^{5}$ by\n\n$$\n\\begin{aligned}\n\\partial_{x_{u}} \\Phi(\\boldsymbol{x}) & =h_{p(u)}\\left(1+\\log \\frac{x_{u}}{x_{p(u)}}\\right)-h_{u} \\sum_{v \\in c(u)} \\frac{x_{v}}{x_{u}} \\\\\n\\partial_{x_{u} x_{u}} \\Phi(\\boldsymbol{x}) & =\\frac{h_{p(u)}}{x_{u}}+h_{u} \\sum_{v \\in c(u)} \\frac{x_{v}}{x_{u}^{2}} \\\\\n\\partial_{x_{u} x_{p(u)}} \\Phi(\\boldsymbol{x})=\\partial_{x_{p(u)} x_{u}} \\Phi(\\boldsymbol{x}) & =-\\frac{h_{p(u)}}{x_{p(u)}}\n\\end{aligned}\n$$\n\nwith all other second-order derivatives being zero. Elements of the normal cone $N_{K}$ are of the form $\\sum_{u \\in V \\backslash\\{r\\}}\\left(\\lambda_{u}-\\lambda_{p(u)}\\right) \\boldsymbol{e}_{u}$, where $\\lambda_{u} \\in \\mathbb{R}$ is a Lagrange multiplier corresponding to the constraint $x_{u}=\\sum_{v \\in c(u)} x_{v}$ for $u \\in V \\backslash \\mathcal{L}$, and $\\lambda_{\\ell}=0$ for $\\ell \\in \\mathcal{L}$. Equation (C.2) then becomes\n\n$$\n\\nabla^{2} \\Phi(\\boldsymbol{x}(t)) \\dot{\\boldsymbol{x}}(t)=-\\sum_{\\ell \\in \\mathcal{L}} \\gamma_{\\ell}(t) \\boldsymbol{e}_{\\ell}+\\sum_{u \\in V \\backslash\\{r\\}}\\left(\\lambda_{p(u)}(t)-\\lambda_{u}(t)\\right) \\boldsymbol{e}_{u}\n$$\n\n[^0]\n[^0]: ${ }^{5}$ Note that the derivatives of the function $\\sum_{u \\in V} x_{u} \\log x_{u}$ would be different, even though these functions are equal on $K$. The reason is that to calculate the derivatives, we need to consider $\\Phi$ on an open neighborhood of $K$, such as $\\mathbb{R}_{+}^{V}$, where the two definitions of $\\Phi$ are no longer equivalent. For the same reason, replacing $\\sum_{v \\in c(u)} \\frac{x_{u}}{x}$ by 1 in the formula for the first-order derivative and calculating the second-order derivatives based on this would yield an incorrect Hessian. Both formulations yield the same dynamics though.', images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700)), OCRPageObject(index=16, markdown='for some $\\lambda:[0, \\infty) \\rightarrow \\mathbb{R}^{V}$.\nExtending the co-domain of $\\gamma$ from $\\mathbb{R}^{\\mathcal{L}}$ to $\\mathbb{R}^{V}$ by letting $\\gamma_{u}(t)=\\lambda_{u}(t)$ for $u \\in V \\backslash \\mathcal{L}$, and using $\\lambda_{\\ell}=0$ for $\\ell \\in \\mathcal{L}$ the row corresponding to $u$ can be written as\n\n$$\n\\begin{aligned}\n\\gamma_{p(u)}-\\gamma_{u} & =\\partial_{x_{u} x_{u}} \\Phi(\\boldsymbol{x}) \\dot{x}_{u}+\\partial_{x_{u} x_{p(u)}} \\Phi(\\boldsymbol{x}) \\dot{x}_{p(u)}+\\sum_{v \\in c(u)} \\partial_{x_{u} x_{v}} \\Phi(\\boldsymbol{x}) \\dot{x}_{v} \\\\\n& =\\left(\\frac{h_{p(u)}}{x_{u}}+h_{u} \\sum_{v \\in c(u)} \\frac{x_{v}}{x_{u}^{2}}\\right) \\dot{x}_{u}-\\frac{h_{p(u)}}{x_{p(u)}} \\dot{x}_{p(u)}-\\frac{h_{u}}{x_{u}} \\sum_{v \\in c(u)} \\dot{x}_{v} \\\\\n& =\\frac{h_{p(u)}}{x_{u}} \\dot{x}_{u}-\\frac{h_{p(u)}}{x_{p(u)}} \\dot{x}_{p(u)} \\\\\n& =\\frac{h_{p(u)} x_{p(u)}}{x_{u}} \\partial_{t} \\frac{x_{u}}{x_{p(u)}}\n\\end{aligned}\n$$\n\nwhere the penultimate equation uses $\\sum_{v \\in c(u)} x_{v}=x_{u}$ and $\\sum_{v \\in c(u)} \\dot{x}_{v}=\\dot{x}_{u}$ for $u \\in V \\backslash \\mathcal{L}$. To conclude the lemma, it suffices to show\n\n$$\n\\gamma_{u}=\\sum_{\\ell \\in \\mathcal{L}_{u}} \\frac{x_{\\ell} \\gamma_{\\ell}}{x_{u}}\n$$\n\nWe show this by induction on the height of $u$. For $u \\in \\mathcal{L}$ this is immediate. For $u \\in V \\backslash \\mathcal{L}$, it is implied by the following chain of equations, where we use (C.3) and the induction hypothesis:\n\n$$\n0=h_{u} \\sum_{v \\in c(u)} \\partial_{t} \\frac{x_{v}}{x_{u}}=\\sum_{v \\in c(u)} \\frac{x_{v}}{x_{u}}\\left(\\gamma_{u}-\\gamma_{v}\\right)=\\gamma_{u}-\\sum_{v \\in c(u)} \\frac{x_{v}}{x_{u}} \\sum_{\\ell \\in \\mathcal{L}_{v}} \\frac{x_{\\ell} \\gamma_{\\ell}}{x_{v}}=\\gamma_{u}-\\sum_{\\ell \\in \\mathcal{L}_{u}} \\frac{x_{\\ell} \\gamma_{\\ell}}{x_{u}}\n$$\n\nLemma C.3. (Movement cost) The instantaneous cost during continuous movement is at most\n\n$$\n2 \\sum_{u \\in V \\backslash\\{r\\}} h_{p(u)} x_{p(u)}\\left(\\partial_{t} \\frac{x_{u}}{x_{p(u)}}\\right)_{-}\n$$\n\nwhere we write $(z)_{-}=\\max \\{-z, 0\\}$.\nProof. Consider an infinitesimal time step, where the algorithm moves from configuration $\\boldsymbol{x}(t)$ to configuration $\\boldsymbol{x}(t+d t)$. Rather than going from $\\boldsymbol{x}(t)$ to $\\boldsymbol{x}(t+d t)$ directly, one could break the movement into pieces, where the algorithm goes through the vertices $u \\in V \\backslash \\mathcal{L}$ one by one (bottom-up, say) and only updates the "conditional probabilities" $\\frac{x_{v}}{x_{u}}$ for $v \\in c(u)$. Since these conditional probabilities fully specify $\\boldsymbol{x}$, the algorithm still reaches the same configuration $\\boldsymbol{x}(t+d t)$ in the end. Hence, the combined cost of these movements upper bounds the optimal transport cost from $\\boldsymbol{x}(t)$ to $\\boldsymbol{x}(t+d t)$.\n\nConsider now a step where only the conditional probabilities of children of a single vertex $u$ are updated while other conditional probabilities are held constant. The instantaneous cost of this movement is\n\n$$\n2 h_{u} x_{u} \\sum_{v \\in c(u)}\\left(\\partial_{t} \\frac{x_{v}}{x_{u}}\\right)_{-}\n$$\n\nsince $x_{u}\\left(\\partial_{t} \\frac{x_{v}}{x_{u}}\\right)_{-}$is the rate at which mass is moving from leaves in $\\mathcal{L}_{v}$ upwards to $u$ by a distance $h_{u}$, and by conservation of mass it is moving downwards the same distance to leaves below siblings of $v$. Summing over all $u \\in V \\backslash \\mathcal{L}$ yields the lemma.', images=[], dimensions=OCRPageDimensions(dpi=200, height=2200, width=1700))] model='mistral-ocr-2503-completion' usage_info=OCRUsageInfo(pages_processed=17, doc_size_bytes=4138308)