Explicit ↑ can trigger slow unification

### Prerequisites

Please put an X between the brackets as you perform the following steps:

* [X] Check that your issue is not already filed:
      https://github.com/leanprover/lean4/issues
* [X] Reduce the issue to a minimal, self-contained, reproducible test case.
      Avoid dependencies to Mathlib or Batteries.
* [X] Test your test case against the latest nightly release, for example on
      https://live.lean-lang.org/#project=lean-nightly
      (You can also use the settings there to switch to “Lean nightly”)

### Description

Using `↑x` in a term (that has an explicit expected type) at a point where the expected type of `↑x` is not yet known can cause unnecessary attempts at unification that can be very slow.


### Context

This was noticed because it can cause `apply?`/`exact?` to time out unexpectedly. This can be a serious drag on QoL when working interactively, as it can render `apply?`/`exact?` almost unusable. (And there is no work-around by writing terms differently; the terms are generated internally by `apply?`/`exact?`.)

Further investigation isolated the issue described here.

See [#mathlib4 > apply? failure @ 💬](https://leanprover.zulipchat.com/#narrow/channel/287929-mathlib4/topic/apply.3F.20failure/near/516740598), [#mathlib4 > Coercion triggers timeout @ 💬](https://leanprover.zulipchat.com/#narrow/channel/287929-mathlib4/topic/Coercion.20triggers.20timeout/near/517177117), and [#lean4 > Coercion and unification @ 💬](https://leanprover.zulipchat.com/#narrow/channel/270676-lean4/topic/Coercion.20and.20unification/near/517382951) (and further threads linked there) on Zulip for more details.

The issue [lean4#2831](https://github.com/leanprover/lean4/pull/2831) seems related. It appears that the slowness observed there is caused by the same phenomenon.


### Steps to Reproduce

Here is a Mathlib-free example that shows that minor differences in the code can make a big difference in elaboration time. (The heartbeat counts are with v4.20.0-rc5 on the web editor.)

```lean
@[irreducible] def A : Type := Unit

@[irreducible] def B : Type := Unit

unseal B in
@[coe] def AtoB (_a : A) : B := ()

instance : Coe A B where coe := AtoB

def dontUnfoldMe {α : Type} (x y : α) : Nat → List Bool → Nat
  | 0, _ => 0
  | n+1, l => dontUnfoldMe x y n (true::l) + dontUnfoldMe x y n (false::l)

theorem dontUnfoldMe_eq {α : Type} (x y : α) : ∀ n l, dontUnfoldMe x y n l = 0
  | 0, _ => rfl
  | n+1, l => by rw [dontUnfoldMe, dontUnfoldMe_eq x y n, dontUnfoldMe_eq x y n]

set_option trace.profiler true
set_option trace.profiler.useHeartbeats true
-- set_option trace.Meta.isDefEq true
-- set_option trace.Elab.coe true

-- 71k profiler heartbeats
example (b : B) (a : A) : dontUnfoldMe b (AtoB a) 14 [] = 0 :=
  dontUnfoldMe_eq b (↑a) 14 [] -- fast

-- 103165k (25787k for 12) profiler heartbeats
example (a : A) (b : B) : dontUnfoldMe (AtoB a) b 14 [] = 0 :=
  dontUnfoldMe_eq (↑a) b 14 [] -- slow (~ 2^m)

-- 71k profiler heartbeats
example (a : A) (b : B) : dontUnfoldMe (AtoB a) b 14 [] = 0 :=
  (dontUnfoldMe_eq (↑a) b 14 [] :) -- fast

-- 25k profiler heartbeats
example (a : A) (b : B) : dontUnfoldMe (AtoB a) b 14 [] = 0 :=
  dontUnfoldMe_eq .. -- fast

-- 152555k (10), 50889k (9) profiler heartbeats
example (a : A) (b : B) : dontUnfoldMe (AtoB a) b 10 [] = 0 :=
  dontUnfoldMe_eq (↑a) b .. -- slower (~ 3^m)

-- 30495k (7), 10473k (6) profiler heartbeats
-- fails (type mismatch)
example (a : A) (b : B) : dontUnfoldMe (AtoB a) b 7 [] = 0 :=
  dontUnfoldMe_eq (↑a) .. -- slower (~ 3^m)

-- 25k profiler heartbeats
example (a : A) (b : B) : dontUnfoldMe (AtoB a) b 14 [] = 0 :=
  dontUnfoldMe_eq _ b .. -- fast

-- 66k profiler heartbeats
example (a : A) (b : B) : dontUnfoldMe (AtoB a) b 14 [] = 0 :=
  dontUnfoldMe_eq (α := B) a .. -- fast
```

**Interpretation:**

* When the expected type of `↑a` is known when this term is elaborated, the coercion `AtoB` is found, and `↑a` is replaced by `AtoB a`. This leads to an (expected) fast final type-check.
* When the expected type of `↑a` is not (yet) known, `↑a` gets replaced by `?m.abc : ?m.xyz`, where `?m.abc` is an *unassignable* metavariable (and `?.xyz` is assignable). In particular `?m.abc` does not unify with `AtoB a`. When Lean tries to unify the type of `dontUnfoldMe_eq ...` with the expected type `dontUnfoldMe (AtoB a) n 14 []`, this forces it to keep unfolding `dontUnfoldMe` all the way down, until unification either succeeds (because the metavariable gets eliminated in the end) or fails.
* However, before we try to unify the types, we actually know the expected type of `↑a` (because it is determined by the type `B` of `b`), so Lean *could* go back and check for the coercion to be used, thus avoiding the slow unification attempt.
* When the given term is `(dontUnfoldMe_eq (↑a) b 14 [] :)`, it is elaborated without expected type; at the end of that, Lean goes back to `↑a` and finds the coercion, resulting in the correct type without unsassignable metavariables.

Compare (parts of) the trace for `(dontUnfoldMe_eq (↑a) b 14 [] :)`:
```text
    [step] [44453.000000] expected type: dontUnfoldMe (↑a) b 14 [] = 0, term
        (dontUnfoldMe_eq (↑a) b 14 [] :) ▼
      # Elaborate the definition without expected type first
      [] [5440.000000] expected type: <not-available>, term
          dontUnfoldMe_eq (↑a) b 14 [] ▼
        [] [216.000000] expected type: ?m.263087, term
            (↑a) ▼
          [] [90.000000] expected type: ?m.263087, term
              ↑a
        # At this point, we don't know yet which type `↑a` is supposed to have
        # --> replace by `?m.263088 : ?m.263087` where `?m.263088` is unassignable
        [Meta.isDefEq] [7.000000] ✅️ ?m.263087 =?= ?m.263087
        [] [569.000000] expected type: B, term
            b ▼
          [Meta.isDefEq] [287.000000] ✅️ ?m.263087 =?= B ▼
            # We know that the second argument `b` has type `B` and
            # that the first argument has the same type
            # --> assign `?m.263087 := B`
            [] ?m.263087 [assignable] =?= B [nonassignable]
            [assign] [198.000000] ✅️ ?m.263087 := B ▶
      # Now we go back to `↑a`; it has expected type `B`, and we find the coercion.
      [] [37967.000000] expected type: B, term
          ↑a ▼
        [] [197.000000] expected type: <not-available>, term
            a
        [Meta.isDefEq] [83.000000] ❌️ A =?= B
        [coe] [37451.000000] adding coercion for a : A =?= B ▶
      # Finally, we check that the types agree, which is now trivial, as they are the same.
      [Meta.isDefEq] [544.000000] ✅️ dontUnfoldMe (↑a) b 14 [] = 0 =?= dontUnfoldMe (↑a) b 14 [] = 0 ▼
        [] [14.000000] ✅️ dontUnfoldMe (↑a) b 14 [] =?= dontUnfoldMe (↑a) b 14 []
        [] [10.000000] ✅️ 0 =?= 0
        [] [7.000000] ✅️ Nat =?= Nat
    [Meta.isDefEq] [430.000000] ✅️ dontUnfoldMe (↑a) b 14 [] = 0 =?= dontUnfoldMe (↑a) b 14 [] = 0 ▶
```

with that for `dontUnfoldMe_eq (↑a) b 14 []`:
```text
    # Elaborate the definition with expected type `dontUnfoldMe (↑a) b 14 [] = 0`
    [step] [54948707.000000] expected type: dontUnfoldMe (↑a) b 14 [] = 0, term
        dontUnfoldMe_eq (↑a) b 14 [] ▼
      [] [216.000000] expected type: ?m.749, term
          (↑a) ▼
        [] [90.000000] expected type: ?m.749, term
            ↑a
      # At this point, we don't know yet which type `↑a` is supposed to have
      # --> replace by `?m.750 : ?m.749` where `?m.750` is unassignable
      [Meta.isDefEq] [7.000000] ✅️ ?m.749 =?= ?m.749
      [] [568.000000] expected type: B, term
          b ▼
        [Meta.isDefEq] [286.000000] ✅️ ?m.749 =?= B ▼
          # We know that the second argument `b` has type `B` and
          # that the first argument has the same type
          # (*) --> assign `?m.749 := B`
          [] ?m.749 [assignable] =?= B [nonassignable]
          [assign] [197.000000] ✅️ ?m.749 := B ▶
      # Now check if the type of the example can be the same as the type of its definition
      [Meta.isDefEq] [54943086.000000] ✅️ dontUnfoldMe (↑a) b 14 [] = 0 =?= dontUnfoldMe ?m.750 b 14 [] = 0 ▼
        # This unification only succeeds after unfolding `dontUnfoldMe` all the way down,
        # taking exponential time.
        # (In most similar "real-life" situations, the unification will fail because `?m.750`
        #  is unassignable and so, as long as it occurs in the term on the left, the problem
        #  has no solution. In particular, `?m.750` does not get assigned to `AtoB a`,
        #  which occurs at the corresponding position on the right.)
        [] [54942546.000000] ✅️ dontUnfoldMe (↑a) b 14 [] =?= dontUnfoldMe ?m.750 b 14 [] ▼
          [delta] [340.000000] ❌️ dontUnfoldMe (↑a) b 14 [] =?= dontUnfoldMe ?m.750 b 14 [] ▶
          # Now we start unfolding...
            [] [54940326.000000] ✅️ dontUnfoldMe (↑a) b 13 [true] +
                dontUnfoldMe (↑a) b 13 [false] =?= dontUnfoldMe ?m.750 b 13 [true] + dontUnfoldMe ?m.750 b 13 [false] ▼
            [delta] [54939810.000000] ✅️ dontUnfoldMe (↑a) b 13 [true] +
              [] [27460189.000000] ✅️ dontUnfoldMe (↑a) b 13 [true] =?= dontUnfoldMe ?m.750 b 13 [true] ▶
              [] [27478501.000000] ✅️ dontUnfoldMe (↑a) b 13 [false] =?= dontUnfoldMe ?m.750 b 13 [false] ▶
    # After the unification has succeeded, there is another check that the types agree,
    # this time with sides switched, which takes equally long.
    [Meta.isDefEq] [52116086.000000] ✅️ dontUnfoldMe ?m.750 b 14 [] = 0 =?= dontUnfoldMe (↑a) b 14 [] = 0 ▶
    # Only now do we go back and look at the term `↑a` in `dontUnfoldMe_eq (↑a) b 14 []` again.
    # We now know that the expected type is `B`, so we can go and find the coercion `AtoB`.
    # But note that we knew that already at point (*) above!
    [step] [37987.000000] expected type: B, term
        ↑a ▼
      [] [197.000000] expected type: <not-available>, term
          a
      [Meta.isDefEq] [83.000000] ❌️ A =?= B
      [coe] [37471.000000] adding coercion for a : A =?= B ▶
```

**Expected behavior:** Elaboration speed should not heavily depend on the order of the arguments `↑a` and `b`.

**Actual behavior:** Elaboration can be *much* slower when `↑a` precedes `b`.

### Versions

v4.20.0-rc5 on live.lean-lang.org.
v4.19.0 on my laptop (Linux Debian 6.1.115-1 (2024-11-01)).

### Additional Information

[Jovan Gerbscheid](https://github.com/JovanGerb) has a PR [lean#8294](https://github.com/leanprover/lean4/pull/8294) that improves caching of unification results (and failures, I assume). This mitigates the problem to some extent (the slow unification attempts caused by this seem to involve quite a bit of repetition in real-life cases), but does not fix its root cause.


### Proposed Fix

**Disclaimer:** I do not know how the unification algorithm works in detail, and so I have no idea whether the following is practical.

The suggestion would be to keep track of when the type metavariable of the unassignable metavariable that stands for `↑x` is assigned and at this point go back and re-elaborate `↑x` with the now known expected type (so that hopefully the correct coercion can be found and the unassignable metavariable eliminated).


### Impact

Add :+1: to [issues you consider important](https://github.com/leanprover/lean4/issues?q=is%3Aissue+is%3Aopen+sort%3Areactions-%2B1-desc). If others are impacted by this issue, please ask them to add :+1: to it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit ↑ can trigger slow unification #8364

Prerequisites

Description

Context

Steps to Reproduce

Versions

Additional Information

Proposed Fix

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Explicit ↑ can trigger slow unification #8364

Description

Prerequisites

Description

Context

Steps to Reproduce

Versions

Additional Information

Proposed Fix

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions