diff --git a/ROADMAP.md b/ROADMAP.md index d278bfc47..3d23cf995 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -513,6 +513,25 @@ gaps that must be fixed first. - [x] Phase 7 — `RegexIterator`, `RecursiveRegexIterator` - [x] Phase 8 — file/directory iterators: `SplFileInfo`, `SplFileObject`, `SplTempFileObject`, `DirectoryIterator`, `FilesystemIterator`, `GlobIterator`, `RecursiveDirectoryIterator`, `RecursiveCachingIterator` +### Array builtin parity (key/list helpers, associative set-ops, recursive merge/walk) + +Well-bounded PHP-visible array builtins added before the backend migration. All dual-target (ARM64 + x86_64), with codegen and error tests, runtime-GC verified. + +- [x] `array_key_first()` / `array_key_last()` (PHP 7.3) — first/last key in insertion order, boxed as `Mixed`, `null` for empty arrays +- [x] `array_is_list()` (PHP 8.1) — sequential `0..n-1` key check (indexed arrays are lists by construction; associative arrays walk the insertion-order chain) +- [x] `array_replace()` — right-wins key merge over associative arrays (clone + `__rt_hash_set` overwrite) +- [x] `array_replace_recursive()` — recursive right-wins merge, recursing when both values at a key are associative arrays +- [x] `array_diff_assoc()` / `array_intersect_assoc()` — key + string-cast-value comparison via a unified `__rt_assoc_diff_intersect` helper +- [x] `array_merge_recursive()` — integer-key renumbering, string-key collisions recurse (both arrays) or combine into a list (scalars) +- [x] `array_walk_recursive()` — invokes the callback on each non-array leaf, recursing through nested indexed/associative arrays +- [x] `array_find()` / `array_any()` / `array_all()` (PHP 8.4) — predicate callbacks; find returns the first match or `null`, any/all return booleans +- [x] `array_udiff()` / `array_uintersect()` — difference/intersection with a user comparator (`$cmp($a, $b) === 0`) +- [x] `array_multisort()` — sort the first indexed array ascending (stable) and reorder a second array in tandem, both by reference (two scalar-element arrays; flags/descending/multi-key are follow-ups) +- [x] Scalar indexed-array inputs for the hash-based functions (`array_replace`, `array_replace_recursive`, `array_diff_assoc`, `array_intersect_assoc`, `array_merge_recursive`) — converted to integer-keyed hashes via `__rt_array_to_hash`; result key/value widen to `Mixed` for heterogeneous (indexed + string-keyed) inputs so `foreach` dispatches keys correctly (string/heap element indexed inputs are a follow-up — they hit x86-specific converter/clone-shallow issues) +- [x] Hash-based builtins persist string values (instead of incref-sharing) when building results, so results own their string payloads independently of the source/temporary inputs (fixes a latent use-after-free when an input is freed before the result) +- [x] `array_merge_recursive()` string-scalar combine fix — combined-list string values are persisted (independent copies) instead of incref-shared, so the temporary wrappers can be released without corrupting the result +- Hash-based functions accept associative arrays and scalar-element indexed arrays; string/heap-element indexed inputs and the callback/sort functions' element-type limits are documented in `docs/php/arrays.md` + ## v0.24.x — EIR introduction and register allocation Introduce a domain-specific intermediate representation (EIR) between the diff --git a/docs/php/arrays.md b/docs/php/arrays.md index 0d57af0f5..12308ff09 100644 --- a/docs/php/arrays.md +++ b/docs/php/arrays.md @@ -152,11 +152,17 @@ PHP does not allow keyed and unkeyed entries in the same destructuring pattern, | `array_keys()` | `array_keys($arr): array` | Returns the array keys | | `array_values()` | `array_values($arr): array` | Returns copy of values | | `array_key_exists()` | `array_key_exists($key, $arr): bool` | Check if key exists | +| `array_key_first()` | `array_key_first($arr): int\|string\|null` | First key in insertion order, or `null` if the array is empty | +| `array_key_last()` | `array_key_last($arr): int\|string\|null` | Last key in insertion order, or `null` if the array is empty | +| `array_is_list()` | `array_is_list($arr): bool` | `true` if the keys are exactly `0..count-1` in order (the empty array is a list) | | `array_search()` | `array_search($needle, $arr): int\|string\|false` | Search for value, returning an integer index for indexed arrays, the first matching associative-array key, or `false` if not found | | `array_slice()` | `array_slice($arr, $offset [, $length]): array` | Extract a slice | | `array_splice()` | `array_splice($arr, $offset [, $length]): array` | Remove a slice in place and return the removed elements | | `array_chunk()` | `array_chunk($arr, $size): array` | Split into chunks | | `array_merge()` | `array_merge($arr1, $arr2): array` | Merge two arrays | +| `array_merge_recursive()` | `array_merge_recursive($arr1, $arr2): array` | Recursively merge two arrays: integer keys append (renumbered), string keys that collide recurse when both values are arrays and otherwise combine into a list. Accepts associative arrays or **indexed arrays of scalars** (int/float/bool); nested indexed-array values are treated as opaque. | +| `array_replace()` | `array_replace($arr, $replacements): array` | Overwrite matching keys in `$arr` (in place, keeping position) and append new keys from `$replacements`; later values win. Accepts associative arrays or **indexed arrays of scalars** (int/float/bool). | +| `array_replace_recursive()` | `array_replace_recursive($arr, $replacements): array` | Like `array_replace()`, but when both values at a key are associative arrays they are merged recursively instead of overwritten. Accepts associative arrays or **indexed arrays of scalars** (int/float/bool); nested indexed arrays are overwritten, not merged. | | `array_combine()` | `array_combine($keys, $values): array` | Create array from keys/values | | `array_fill()` | `array_fill($start, $num, $value): array` | Fill with values | | `array_fill_keys()` | `array_fill_keys($keys, $value): array` | Fill with values using keys | @@ -166,6 +172,10 @@ PHP does not allow keyed and unkeyed entries in the same destructuring pattern, | `array_intersect()` | `array_intersect($arr1, $arr2): array` | Values in both | | `array_diff_key()` | `array_diff_key($arr1, $arr2): array` | Keys in $arr1 not in $arr2 | | `array_intersect_key()` | `array_intersect_key($arr1, $arr2): array` | Keys in both | +| `array_udiff()` | `array_udiff($arr1, $arr2, $cmp): array` | Values in $arr1 not in $arr2, equality decided by the two-argument comparator (`$cmp($a, $b) === 0`). Supports string / function / non-capturing closure comparators. | +| `array_uintersect()` | `array_uintersect($arr1, $arr2, $cmp): array` | Values in both arrays, equality decided by the comparator (`$cmp($a, $b) === 0`). | +| `array_diff_assoc()` | `array_diff_assoc($arr1, $arr2): array` | Entries of $arr1 whose `(key, value)` pair is absent from $arr2 (values compared as `(string)$a === (string)$b`). Accepts associative arrays or **indexed arrays of scalars** (int/float/bool). | +| `array_intersect_assoc()` | `array_intersect_assoc($arr1, $arr2): array` | Entries of $arr1 whose `(key, value)` pair is present in $arr2 (values compared as strings). Accepts associative arrays or **indexed arrays of scalars** (int/float/bool). | | `array_unique()` | `array_unique($arr): array` | Remove duplicates | | `array_reverse()` | `array_reverse($arr): array` | Reverse order | | `array_flip()` | `array_flip($arr): array` | Exchange keys and values, normalizing integer and numeric-string result keys | @@ -185,6 +195,11 @@ PHP does not allow keyed and unkeyed entries in the same destructuring pattern, | `shuffle()` | `shuffle($arr): void` | Randomly shuffle (in-place) | | `array_rand()` | `array_rand($arr): int` | Pick one random key | | `array_map()` | `array_map($callback, $arr): array` | Apply callback to each element | +| `array_walk_recursive()` | `array_walk_recursive($arr, $callback): void` | Apply `$callback` to each non-array leaf value, recursing into nested indexed/associative arrays. Leaf values must share a scalar type (consistent with `array_walk`: leaf passed by value, no key argument). | +| `array_multisort()` | `array_multisort($arr1, $arr2): bool` | Sort `$arr1` ascending (stable) and reorder `$arr2` in tandem; both are sorted in place (by reference). **Two indexed arrays of scalar elements**; sort flags, descending order, and >2 arrays are follow-ups. | +| `array_find()` | `array_find($arr, $callback): mixed` | (PHP 8.4) Returns the first element for which `$callback($value)` is truthy, or `null` if none match. | +| `array_any()` | `array_any($arr, $callback): bool` | (PHP 8.4) `true` if `$callback($value)` is truthy for at least one element. | +| `array_all()` | `array_all($arr, $callback): bool` | (PHP 8.4) `true` if `$callback($value)` is truthy for every element. | | `array_filter()` | `array_filter($arr, $callback): array` | Filter where callback is truthy | | `array_reduce()` | `array_reduce($arr, $callback, $init): int` | Reduce to single value | | `array_walk()` | `array_walk($arr, $callback): void` | Call callback on each element | diff --git a/examples/assoc-arrays/main.php b/examples/assoc-arrays/main.php index bfb36d2fc..67359c62a 100644 --- a/examples/assoc-arrays/main.php +++ b/examples/assoc-arrays/main.php @@ -106,3 +106,41 @@ echo "\n"; } echo "As JSON: " . json_encode($profile) . "\n"; + +// First and last keys, and list-shape detection +$ranking = ["gold" => 1, "silver" => 2, "bronze" => 3]; +echo "\nFirst key: " . array_key_first($ranking) . "\n"; +echo "Last key: " . array_key_last($ranking) . "\n"; +echo "Ranking is a list? " . (array_is_list($ranking) ? "yes" : "no") . "\n"; +echo "[10,20,30] is a list? " . (array_is_list([10, 20, 30]) ? "yes" : "no") . "\n"; + +// array_replace: later values win, keys keep their position +$config = ["host" => "localhost", "port" => 8080, "debug" => 0]; +$patched = array_replace($config, ["port" => 9090, "debug" => 1]); +echo "\nPatched config:\n"; +foreach ($patched as $key => $value) { + echo " " . $key . " = " . $value . "\n"; +} + +// array_merge_recursive: nested arrays merge instead of being overwritten +$a = ["limits" => ["cpu" => 1], "tags" => ["a" => 1]]; +$b = ["limits" => ["mem" => 2], "tags" => ["b" => 2]]; +$merged = array_merge_recursive($a, $b); +echo "\nRecursively merged limits:\n"; +foreach ($merged["limits"] as $key => $value) { + echo " " . $key . " = " . $value . "\n"; +} + +// array_diff_assoc / array_intersect_assoc compare both key and value +$left = ["a" => 1, "b" => 2, "c" => 3]; +$right = ["a" => 1, "b" => 9]; +echo "\nDiff (kept from left): " . count(array_diff_assoc($left, $right)) . " entries\n"; +echo "Intersect (in both): " . count(array_intersect_assoc($left, $right)) . " entries\n"; + +// The hash-based functions also accept plain indexed arrays of scalars: the +// indexed input is treated as an integer-keyed map (key 0, 1, 2, ...). +$levels = array_replace([10, 20, 30], [1 => 99]); +echo "\nPatched levels:\n"; +foreach ($levels as $index => $level) { + echo " [" . $index . "] = " . $level . "\n"; +} diff --git a/src/codegen/builtins/arrays/array_find_any_all.rs b/src/codegen/builtins/arrays/array_find_any_all.rs new file mode 100644 index 000000000..2611e937c --- /dev/null +++ b/src/codegen/builtins/arrays/array_find_any_all.rs @@ -0,0 +1,118 @@ +//! Purpose: +//! Emits PHP `array_find`, `array_any`, and `array_all` (PHP 8.4) predicate builtins. +//! Resolves the predicate callback and dispatches to the unified `__rt_array_find_any_all` helper. +//! +//! Called from: +//! - `crate::codegen::builtins::arrays::emit()`. +//! +//! Key details: +//! - Reuses the array-callback machinery for string and closure/function callbacks; a mode selects find/any/all. + +use crate::codegen::abi; +use crate::codegen::context::Context; +use crate::codegen::data_section::DataSection; +use crate::codegen::emit::Emitter; +use crate::codegen::expr::emit_expr; +use crate::parser::ast::Expr; +use crate::types::PhpType; +use super::callback_env; +use super::runtime_string_callback; + +/// Emits the PHP 8.4 `array_find` / `array_any` / `array_all` predicate builtins. +/// +/// Evaluates the array (first arg), then resolves the predicate callback. The unified +/// runtime helper `__rt_array_find_any_all` receives `(callback, array, env, mode)` where +/// mode is `0` (find — returns the first matching element or `null`), `1` (any — boolean), +/// or `2` (all — boolean). +/// +/// Supports string callbacks (`"is_positive"`) and closures / plain function callbacks, +/// covering the dominant predicate usage. Operates on indexed arrays with scalar elements +/// (consistent with `array_filter`). +/// +/// # Returns +/// `Some(PhpType::Mixed)` for `array_find` (element or null), `Some(PhpType::Bool)` for +/// `array_any` / `array_all`. +pub fn emit( + name: &str, + args: &[Expr], + emitter: &mut Emitter, + ctx: &mut Context, + data: &mut DataSection, +) -> Option { + let mode: i64 = match name { + "array_any" => 1, + "array_all" => 2, + _ => 0, + }; + let ret_ty = if name == "array_find" { + PhpType::Mixed + } else { + PhpType::Bool + }; + emitter.comment(&format!("{}()", name)); + + let call_reg = abi::nested_call_reg(emitter); + let result_reg = abi::int_result_reg(emitter); + let cb_arg = abi::int_arg_reg_name(emitter.target, 0); + let arr_arg = abi::int_arg_reg_name(emitter.target, 1); + let env_arg = abi::int_arg_reg_name(emitter.target, 2); + let mode_arg = abi::int_arg_reg_name(emitter.target, 3); + + // -- evaluate the array argument, then the callback (PHP source order) -- + let arr_ty = emit_expr(&args[0], emitter, ctx, data); + let elem_ty = match &arr_ty { + PhpType::Array(elem) => elem.codegen_repr(), + _ => PhpType::Int, + }; + abi::emit_push_reg(emitter, result_reg); // push the source array pointer onto the temporary stack + + // -- string callback path ("is_positive") -- + if runtime_string_callback::emit_after_saved_array( + &args[1], + Some(&arr_ty), + vec![elem_ty.clone()], + PhpType::Bool, + arr_arg, + emitter, + ctx, + data, + |wrapper, emitter, _ctx, _data| { + callback_env::load_env_slot_to_reg(emitter, arr_arg, wrapper.array_slot_offset); + abi::emit_symbol_address(emitter, cb_arg, &wrapper.wrapper_label); + callback_env::load_env_pointer_to_reg(emitter, env_arg); + abi::emit_load_int_immediate(emitter, mode_arg, mode); + abi::emit_call_label(emitter, "__rt_array_find_any_all"); + }, + ) { + return Some(ret_ty); + } + + // -- closure / plain function callback path -- + let captures = + callback_env::materialize_callback_address(&args[1], call_reg, emitter, ctx, data); + if captures.is_empty() { + abi::emit_pop_reg(emitter, arr_arg); // pop the source array pointer into the array argument register + emitter.instruction(&format!("mov {}, {}", cb_arg, call_reg)); // move the callback function address into the callback argument register + abi::emit_load_int_immediate(emitter, env_arg, 0); + abi::emit_load_int_immediate(emitter, mode_arg, mode); + abi::emit_call_label(emitter, "__rt_array_find_any_all"); + } else { + abi::emit_pop_reg(emitter, result_reg); // recover the source array pointer before building the capture environment + let wrapper = callback_env::emit_captured_callback_env( + call_reg, + result_reg, + &captures, + vec![elem_ty], + emitter, + ctx, + ); + callback_env::load_env_slot_to_reg(emitter, arr_arg, wrapper.array_slot_offset); + abi::emit_symbol_address(emitter, cb_arg, &wrapper.wrapper_label); + callback_env::load_env_pointer_to_reg(emitter, env_arg); + abi::emit_load_int_immediate(emitter, mode_arg, mode); + abi::emit_call_label(emitter, "__rt_array_find_any_all"); + abi::emit_release_temporary_stack(emitter, wrapper.env_bytes); + } + + Some(ret_ty) +} diff --git a/src/codegen/builtins/arrays/array_is_list.rs b/src/codegen/builtins/arrays/array_is_list.rs new file mode 100644 index 000000000..cd05b088b --- /dev/null +++ b/src/codegen/builtins/arrays/array_is_list.rs @@ -0,0 +1,63 @@ +//! Purpose: +//! Emits PHP `array_is_list` builtin calls. +//! Returns a boolean indicating whether an array has sequential 0..n-1 integer keys. +//! +//! Called from: +//! - `crate::codegen::builtins::arrays::emit()`. +//! +//! Key details: +//! - Indexed arrays are lists by construction; associative arrays and Mixed values defer to the runtime walk. + +use crate::codegen::abi; +use crate::codegen::context::Context; +use crate::codegen::data_section::DataSection; +use crate::codegen::emit::Emitter; +use crate::codegen::expr::emit_expr; +use crate::codegen::platform::Arch; +use crate::parser::ast::Expr; +use crate::types::PhpType; + +/// Emits code for the PHP `array_is_list` builtin. +/// +/// `array_is_list($array)` returns `true` when the array's keys are exactly the +/// integers `0..count-1` in order (the empty array is a list), and `false` +/// otherwise. +/// +/// # Codegen +/// - Evaluates `args[0]` into the container register. +/// - For a statically indexed `PhpType::Array`, loads the constant `1`: indexed +/// arrays always have sequential keys, so the runtime walk is skipped. +/// - For associative arrays and `Mixed` values, calls `__rt_array_is_list`, which +/// reads the heap kind, walks the hash insertion-order chain, and unwraps boxed +/// array payloads. +/// +/// # Returns +/// `Some(PhpType::Bool)` — the list-shape predicate result in the integer result register. +pub fn emit( + _name: &str, + args: &[Expr], + emitter: &mut Emitter, + ctx: &mut Context, + data: &mut DataSection, +) -> Option { + emitter.comment("array_is_list()"); + let arr_ty = emit_expr(&args[0], emitter, ctx, data); + + if matches!(arr_ty, PhpType::Array(_)) { + if emitter.target.arch == Arch::X86_64 { + emitter.instruction("mov eax, 1"); // indexed arrays always have sequential 0..n-1 keys + } else { + emitter.instruction("mov x0, #1"); // indexed arrays always have sequential 0..n-1 keys + } + return Some(PhpType::Bool); + } + + if emitter.target.arch == Arch::X86_64 { + emitter.instruction("mov rdi, rax"); // move the container pointer into the first x86_64 argument register + abi::emit_call_label(emitter, "__rt_array_is_list"); + return Some(PhpType::Bool); + } + + emitter.instruction("bl __rt_array_is_list"); // walk the hash insertion order to test list shape + Some(PhpType::Bool) +} diff --git a/src/codegen/builtins/arrays/array_key_edge.rs b/src/codegen/builtins/arrays/array_key_edge.rs new file mode 100644 index 000000000..35279130e --- /dev/null +++ b/src/codegen/builtins/arrays/array_key_edge.rs @@ -0,0 +1,66 @@ +//! Purpose: +//! Emits PHP `array_key_first` and `array_key_last` builtin calls. +//! Returns the first or last key of an array boxed as a Mixed value. +//! +//! Called from: +//! - `crate::codegen::builtins::arrays::emit()`. +//! +//! Key details: +//! - The selected key is boxed by `__rt_array_edge_key`; empty containers yield a boxed null. + +use crate::codegen::abi; +use crate::codegen::context::Context; +use crate::codegen::data_section::DataSection; +use crate::codegen::emit::Emitter; +use crate::codegen::expr::emit_expr; +use crate::codegen::platform::Arch; +use crate::parser::ast::Expr; +use crate::types::PhpType; + +/// Emits code for the PHP `array_key_first` / `array_key_last` builtins. +/// +/// `array_key_first($array)` returns the first key, `array_key_last($array)` the +/// last key, both in insertion order, or `null` when the array is empty. The key +/// is returned as a boxed `Mixed` value so int and string keys share one path. +/// +/// # Arguments +/// - `name`: selects the variant; `"array_key_last"` reads the tail, otherwise the head. +/// - `args[0]`: the array expression, evaluated into the container register. +/// +/// # Codegen +/// - Evaluates `args[0]` into the container register. +/// - Loads the first/last selector and calls `__rt_array_edge_key`, which reads the +/// heap kind, picks the head or tail entry, and boxes the key (or null) as a Mixed cell. +/// +/// # Returns +/// `Some(PhpType::Mixed)` — the boxed key (or boxed null) in the integer result register. +pub fn emit( + name: &str, + args: &[Expr], + emitter: &mut Emitter, + ctx: &mut Context, + data: &mut DataSection, +) -> Option { + let last = name == "array_key_last"; + emitter.comment(if last { "array_key_last()" } else { "array_key_first()" }); + emit_expr(&args[0], emitter, ctx, data); + + if emitter.target.arch == Arch::X86_64 { + emitter.instruction("mov rdi, rax"); // move the container pointer into the first x86_64 argument register + if last { + emitter.instruction("mov esi, 1"); // select the last key + } else { + emitter.instruction("mov esi, 0"); // select the first key + } + abi::emit_call_label(emitter, "__rt_array_edge_key"); + return Some(PhpType::Mixed); + } + + if last { + emitter.instruction("mov x1, #1"); // select the last key + } else { + emitter.instruction("mov x1, #0"); // select the first key + } + emitter.instruction("bl __rt_array_edge_key"); // box the selected key as a Mixed value + Some(PhpType::Mixed) +} diff --git a/src/codegen/builtins/arrays/array_merge_recursive.rs b/src/codegen/builtins/arrays/array_merge_recursive.rs new file mode 100644 index 000000000..89c384221 --- /dev/null +++ b/src/codegen/builtins/arrays/array_merge_recursive.rs @@ -0,0 +1,53 @@ +//! Purpose: +//! Emits PHP `array_merge_recursive` builtin calls over two associative arrays. +//! Materializes both array pointers and delegates the recursive merge to the runtime helper. +//! +//! Called from: +//! - `crate::codegen::builtins::arrays::emit()`. +//! +//! Key details: +//! - Operates on hash inputs; the runtime recurses on array collisions and combines scalar collisions. +//! - Scalar indexed-array inputs are converted to integer-keyed hashes by the shared `hash_arg_call` helper. + +use crate::codegen::builtins::arrays::hash_arg_call::emit_two_hash_arg_call; +use crate::codegen::context::Context; +use crate::codegen::data_section::DataSection; +use crate::codegen::emit::Emitter; +use crate::parser::ast::Expr; +use crate::types::PhpType; + +/// Emits code for the PHP `array_merge_recursive` builtin. +/// +/// `array_merge_recursive($a, $b)` merges two associative arrays: integer-keyed entries +/// append with renumbering; string keys that collide recurse when both values are arrays, +/// otherwise the values are combined into a list. The result preserves the first array's +/// key space (widened where the second array contributes). +/// +/// # Codegen +/// - Evaluates `args[0]` (first hash), spills it, evaluates `args[1]` (second hash). +/// - Materializes both pointers and calls `__rt_array_merge_recursive`. +/// +/// # Returns +/// `Some(arr_ty)` — the first argument's array type. +/// +/// # ABI +/// - AArch64: first hash in `x0`, second hash in `x1`; result hash in `x0`. +/// - x86_64: first hash in `rdi`, second hash in `rsi`; result hash in `rax`. +pub fn emit( + _name: &str, + args: &[Expr], + emitter: &mut Emitter, + ctx: &mut Context, + data: &mut DataSection, +) -> Option { + emitter.comment("array_merge_recursive()"); + let (ty0, ty1) = + emit_two_hash_arg_call(args, emitter, ctx, data, "__rt_array_merge_recursive", None); + + // Scalar collisions combine into lists, so the result value type is always Mixed. The key + // widens to Mixed when the two inputs disagree (e.g. an indexed input mixed with string keys). + Some(PhpType::AssocArray { + key: Box::new(PhpType::widen(ty0.hash_key_type(), ty1.hash_key_type())), + value: Box::new(PhpType::Mixed), + }) +} diff --git a/src/codegen/builtins/arrays/array_multisort.rs b/src/codegen/builtins/arrays/array_multisort.rs new file mode 100644 index 000000000..07ffc0f2b --- /dev/null +++ b/src/codegen/builtins/arrays/array_multisort.rs @@ -0,0 +1,77 @@ +//! Purpose: +//! Emits PHP `array_multisort` builtin calls over two parallel arrays, mutating both in place. +//! Prepares copy-on-write storage for each by-reference array, then sorts them in tandem. +//! +//! Called from: +//! - `crate::codegen::builtins::arrays::emit()`. +//! +//! Key details: +//! - Both arrays are by-reference; their PHP-visible storage is updated before the in-place tandem sort. + +use crate::codegen::abi; +use super::ensure_unique_arg::emit_ensure_unique_arg; +use super::store_mutating_arg::emit_store_mutating_arg; +use crate::codegen::context::Context; +use crate::codegen::data_section::DataSection; +use crate::codegen::emit::Emitter; +use crate::codegen::expr::emit_expr; +use crate::codegen::platform::Arch; +use crate::parser::ast::Expr; +use crate::types::PhpType; + +/// Emits the PHP `array_multisort($arr1, $arr2)` builtin call. +/// +/// Sorts `$arr1` ascending (stable) and reorders `$arr2` in tandem so the two parallel +/// arrays stay aligned. Both arguments are by-reference: each is made copy-on-write unique +/// and its caller storage is updated before the runtime sorts both arrays in place. +/// +/// Supports the common two-array form with scalar (integer) elements and ascending order. +/// Sort flags, descending order, multi-key tie-breaking, and more than two arrays are not +/// yet supported. +/// +/// # Returns +/// `Some(PhpType::Bool)` — `array_multisort()` returns `true` on success. +/// +/// # ABI +/// - AArch64: arr1 in `x0`, arr2 in `x1`. x86_64: arr1 in `rdi`, arr2 in `rsi`. +pub fn emit( + _name: &str, + args: &[Expr], + emitter: &mut Emitter, + ctx: &mut Context, + data: &mut DataSection, +) -> Option { + emitter.comment("array_multisort()"); + let result_reg = abi::int_result_reg(emitter); + + // -- first array: evaluate, split shared storage, write the pointer back to its variable -- + let ty1 = emit_expr(&args[0], emitter, ctx, data); + emit_ensure_unique_arg(emitter, &ty1); + emit_store_mutating_arg(emitter, ctx, &args[0]); + abi::emit_push_reg(emitter, result_reg); // save arr1 pointer while evaluating arr2 + + // -- second array: evaluate, split shared storage, write the pointer back to its variable -- + let ty2 = emit_expr(&args[1], emitter, ctx, data); + emit_ensure_unique_arg(emitter, &ty2); + emit_store_mutating_arg(emitter, ctx, &args[1]); + + match emitter.target.arch { + Arch::AArch64 => { + emitter.instruction("mov x1, x0"); // move arr2 pointer into the second runtime argument register + abi::emit_pop_reg(emitter, "x0"); + } + Arch::X86_64 => { + emitter.instruction("mov rsi, rax"); // move arr2 pointer into the second SysV runtime argument register + abi::emit_pop_reg(emitter, "rdi"); + } + } + abi::emit_call_label(emitter, "__rt_array_multisort"); + + // -- array_multisort returns true on success -- + match emitter.target.arch { + Arch::AArch64 => emitter.instruction("mov x0, #1"), // result: true + Arch::X86_64 => emitter.instruction("mov eax, 1"), // result: true + } + + Some(PhpType::Bool) +} diff --git a/src/codegen/builtins/arrays/array_replace.rs b/src/codegen/builtins/arrays/array_replace.rs new file mode 100644 index 000000000..00874af8a --- /dev/null +++ b/src/codegen/builtins/arrays/array_replace.rs @@ -0,0 +1,56 @@ +//! Purpose: +//! Emits PHP `array_replace` builtin calls over two associative arrays. +//! Materializes both array pointers and delegates the right-wins key merge to the runtime helper. +//! +//! Called from: +//! - `crate::codegen::builtins::arrays::emit()`. +//! +//! Key details: +//! - Operates on hash inputs; the runtime clones the first array and overwrites/appends the second. +//! - Scalar indexed-array inputs are converted to integer-keyed hashes by the shared `hash_arg_call` helper. + +use crate::codegen::builtins::arrays::hash_arg_call::emit_two_hash_arg_call; +use crate::codegen::context::Context; +use crate::codegen::data_section::DataSection; +use crate::codegen::emit::Emitter; +use crate::parser::ast::Expr; +use crate::types::PhpType; + +/// Emits code for the PHP `array_replace` builtin. +/// +/// `array_replace($array, $replacements)` returns a new array: a copy of `$array` +/// in which every key present in `$replacements` is overwritten (in place, keeping +/// the original position) and every new key is appended. Later values win. +/// +/// # Codegen +/// - Delegates argument evaluation to `hash_arg_call::emit_two_hash_arg_call`, which evaluates +/// both arguments in source order (converting a scalar indexed input to a hash) and calls +/// `__rt_array_replace`, which shallow-clones the first hash and inserts every entry of the +/// second through `__rt_hash_set` (right-wins), retaining heap/string values. +/// +/// # Returns +/// `Some(arr_ty.as_hash())` — the result is always an integer-keyed hash, so an indexed input is +/// reported as an associative result. +pub fn emit( + name: &str, + args: &[Expr], + emitter: &mut Emitter, + ctx: &mut Context, + data: &mut DataSection, +) -> Option { + let recursive = name == "array_replace_recursive"; + emitter.comment(if recursive { + "array_replace_recursive()" + } else { + "array_replace()" + }); + let runtime_label = if recursive { + "__rt_array_replace_recursive" + } else { + "__rt_array_replace" + }; + let (ty0, ty1) = emit_two_hash_arg_call(args, emitter, ctx, data, runtime_label, None); + + // The runtime always produces a hash; key/value widen to Mixed when the two inputs disagree. + Some(PhpType::two_input_hash_result(&ty0, &ty1)) +} diff --git a/src/codegen/builtins/arrays/array_udiff_uintersect.rs b/src/codegen/builtins/arrays/array_udiff_uintersect.rs new file mode 100644 index 000000000..12e50f520 --- /dev/null +++ b/src/codegen/builtins/arrays/array_udiff_uintersect.rs @@ -0,0 +1,74 @@ +//! Purpose: +//! Emits PHP `array_udiff` and `array_uintersect` builtins (user-comparator difference/intersection). +//! Materializes both arrays and the comparator callback, then dispatches to the runtime helper. +//! +//! Called from: +//! - `crate::codegen::builtins::arrays::emit()`. +//! +//! Key details: +//! - Comparator is a two-argument callback (string / function / non-capturing closure); equal when `cmp(a, b) === 0`. + +use crate::codegen::abi; +use crate::codegen::context::Context; +use crate::codegen::data_section::DataSection; +use crate::codegen::emit::Emitter; +use crate::codegen::expr::emit_expr; +use crate::parser::ast::Expr; +use crate::types::PhpType; +use super::callback_env; + +/// Emits the PHP `array_udiff` / `array_uintersect` builtins. +/// +/// `array_udiff($a, $b, $cmp)` returns the elements of `$a` not found in `$b`, +/// `array_uintersect($a, $b, $cmp)` the elements found in both, where two values are equal +/// when the two-argument comparator returns `0`. The result is an indexed array of the kept +/// elements (repacked at sequential indices, like `array_diff`). +/// +/// Evaluates `$a`, then `$b`, then the comparator (PHP source order). The runtime helper +/// `__rt_array_udiff_uintersect` receives `(comparator, arr1, arr2, env, mode)` with mode +/// `0` (udiff) or `1` (uintersect). +/// +/// Supports string, plain-function, and non-capturing closure comparators (the dominant +/// forms); capturing-closure comparators are not yet supported. Operates on indexed arrays +/// with scalar elements (consistent with `array_diff`). +/// +/// # Returns +/// `Some(arr_ty)` — the first array's type (the kept elements share its element type). +pub fn emit( + name: &str, + args: &[Expr], + emitter: &mut Emitter, + ctx: &mut Context, + data: &mut DataSection, +) -> Option { + let mode: i64 = if name == "array_uintersect" { 1 } else { 0 }; + emitter.comment(&format!("{}()", name)); + + let call_reg = abi::nested_call_reg(emitter); + let result_reg = abi::int_result_reg(emitter); + let cmp_arg = abi::int_arg_reg_name(emitter.target, 0); + let arr1_arg = abi::int_arg_reg_name(emitter.target, 1); + let arr2_arg = abi::int_arg_reg_name(emitter.target, 2); + let env_arg = abi::int_arg_reg_name(emitter.target, 3); + let mode_arg = abi::int_arg_reg_name(emitter.target, 4); + + // -- evaluate the two arrays in source order, saving each on the temporary stack -- + let arr_ty = emit_expr(&args[0], emitter, ctx, data); + abi::emit_push_reg(emitter, result_reg); // save arr1 pointer onto the temporary stack + emit_expr(&args[1], emitter, ctx, data); + abi::emit_push_reg(emitter, result_reg); // save arr2 pointer onto the temporary stack + + // -- resolve the comparator callback into the nested-call register -- + let _captures = + callback_env::materialize_callback_address(&args[2], call_reg, emitter, ctx, data); + + // -- materialize arguments into the runtime registers (comparator, arr1, arr2, env, mode) -- + abi::emit_pop_reg(emitter, arr2_arg); // pop arr2 into the third runtime argument register + abi::emit_pop_reg(emitter, arr1_arg); // pop arr1 into the second runtime argument register + emitter.instruction(&format!("mov {}, {}", cmp_arg, call_reg)); // move the comparator address into the first runtime argument register + abi::emit_load_int_immediate(emitter, env_arg, 0); + abi::emit_load_int_immediate(emitter, mode_arg, mode); + abi::emit_call_label(emitter, "__rt_array_udiff_uintersect"); + + Some(arr_ty) +} diff --git a/src/codegen/builtins/arrays/array_walk.rs b/src/codegen/builtins/arrays/array_walk.rs index 6d7ee296b..9b4517a46 100644 --- a/src/codegen/builtins/arrays/array_walk.rs +++ b/src/codegen/builtins/arrays/array_walk.rs @@ -27,13 +27,23 @@ use super::runtime_string_callback; /// expressions use descriptor-backed environments so receiver/capture metadata survives /// runtime selection. Returns `PhpType::Void` on success. pub fn emit( - _name: &str, + name: &str, args: &[Expr], emitter: &mut Emitter, ctx: &mut Context, data: &mut DataSection, ) -> Option { - emitter.comment("array_walk()"); + let recursive = name == "array_walk_recursive"; + emitter.comment(if recursive { + "array_walk_recursive()" + } else { + "array_walk()" + }); + let walk_label = if recursive { + "__rt_array_walk_recursive" + } else { + "__rt_array_walk" + }; let call_reg = abi::nested_call_reg(emitter); let result_reg = abi::int_result_reg(emitter); let callback_arg_reg = abi::int_arg_reg_name(emitter.target, 0); @@ -42,9 +52,15 @@ pub fn emit( // -- evaluate the array argument (first arg) -- let arr_ty = emit_expr(&args[0], emitter, ctx, data); - let source_elem_ty = match &arr_ty { - PhpType::Array(elem_ty) => elem_ty.codegen_repr(), - _ => PhpType::Int, + // array_walk_recursive invokes the callback on leaf (non-array) values, so the + // callback wrapper is built for the recursively-unwrapped leaf element type. + let source_elem_ty = if recursive { + leaf_element_type(&arr_ty) + } else { + match &arr_ty { + PhpType::Array(elem_ty) => elem_ty.codegen_repr(), + _ => PhpType::Int, + } }; // -- save array pointer -- @@ -63,7 +79,7 @@ pub fn emit( callback_env::load_env_slot_to_reg(emitter, array_arg_reg, wrapper.array_slot_offset); abi::emit_symbol_address(emitter, callback_arg_reg, &wrapper.wrapper_label); callback_env::load_env_pointer_to_reg(emitter, env_arg_reg); - abi::emit_call_label(emitter, "__rt_array_walk"); // call the callback-driven walk runtime helper with a runtime string descriptor + abi::emit_call_label(emitter, walk_label); // call the callback-driven walk runtime helper with a runtime string descriptor }, ) { return Some(PhpType::Void); @@ -82,7 +98,7 @@ pub fn emit( callback_env::load_env_slot_to_reg(emitter, array_arg_reg, wrapper.array_slot_offset); abi::emit_symbol_address(emitter, callback_arg_reg, &wrapper.wrapper_label); callback_env::load_env_pointer_to_reg(emitter, env_arg_reg); - abi::emit_call_label(emitter, "__rt_array_walk"); // call the callback-driven walk runtime helper with a callable-array descriptor environment + abi::emit_call_label(emitter, walk_label); // call the callback-driven walk runtime helper with a callable-array descriptor environment callback_env::release_descriptor_callback_env(&wrapper, emitter); return Some(PhpType::Void); } @@ -99,7 +115,7 @@ pub fn emit( callback_env::load_env_slot_to_reg(emitter, array_arg_reg, wrapper.array_slot_offset); abi::emit_symbol_address(emitter, callback_arg_reg, &wrapper.wrapper_label); callback_env::load_env_pointer_to_reg(emitter, env_arg_reg); - abi::emit_call_label(emitter, "__rt_array_walk"); // call the callback-driven walk runtime helper with a runtime callable-array descriptor + abi::emit_call_label(emitter, walk_label); // call the callback-driven walk runtime helper with a runtime callable-array descriptor }, ) { return Some(PhpType::Void); @@ -124,7 +140,7 @@ pub fn emit( callback_env::load_env_slot_to_reg(emitter, array_arg_reg, wrapper.array_slot_offset); abi::emit_symbol_address(emitter, callback_arg_reg, &wrapper.wrapper_label); callback_env::load_env_pointer_to_reg(emitter, env_arg_reg); - abi::emit_call_label(emitter, "__rt_array_walk"); // call the callback-driven walk runtime helper with a descriptor environment + abi::emit_call_label(emitter, walk_label); // call the callback-driven walk runtime helper with a descriptor environment callback_env::release_descriptor_callback_env(&wrapper, emitter); return Some(PhpType::Void); } @@ -147,7 +163,7 @@ pub fn emit( callback_env::load_env_slot_to_reg(emitter, array_arg_reg, wrapper.array_slot_offset); abi::emit_symbol_address(emitter, callback_arg_reg, &wrapper.wrapper_label); callback_env::load_env_pointer_to_reg(emitter, env_arg_reg); - abi::emit_call_label(emitter, "__rt_array_walk"); // call the callback-driven walk runtime helper with a capture environment + abi::emit_call_label(emitter, walk_label); // call the callback-driven walk runtime helper with a capture environment abi::emit_release_temporary_stack(emitter, wrapper.env_bytes); return Some(PhpType::Void); } else { @@ -155,7 +171,21 @@ pub fn emit( emitter.instruction(&format!("mov {}, {}", callback_arg_reg, call_reg)); // move the callback function address into the first runtime argument register } abi::emit_load_int_immediate(emitter, env_arg_reg, 0); - abi::emit_call_label(emitter, "__rt_array_walk"); // call the callback-driven walk runtime helper + abi::emit_call_label(emitter, walk_label); // call the callback-driven walk runtime helper Some(PhpType::Void) } + +/// Recursively unwraps a nested array type to its scalar leaf element type. +/// +/// `array_walk_recursive` invokes the callback on the deepest non-array values, so the +/// callback wrapper must be built for the leaf type rather than the immediate element type. +/// Walks through `Array` element types and `AssocArray` value types until a non-array type +/// is reached, returning its codegen representation. +fn leaf_element_type(ty: &PhpType) -> PhpType { + match ty { + PhpType::Array(inner) => leaf_element_type(inner), + PhpType::AssocArray { value, .. } => leaf_element_type(value), + other => other.codegen_repr(), + } +} diff --git a/src/codegen/builtins/arrays/assoc_diff_intersect.rs b/src/codegen/builtins/arrays/assoc_diff_intersect.rs new file mode 100644 index 000000000..e885f65d8 --- /dev/null +++ b/src/codegen/builtins/arrays/assoc_diff_intersect.rs @@ -0,0 +1,63 @@ +//! Purpose: +//! Emits PHP `array_diff_assoc` and `array_intersect_assoc` builtin calls over two associative arrays. +//! Materializes both array pointers and a mode selector, then delegates to the unified runtime helper. +//! +//! Called from: +//! - `crate::codegen::builtins::arrays::emit()`. +//! +//! Key details: +//! - Operates on hash inputs; the runtime compares keys and string-cast values. +//! - Scalar indexed-array inputs are converted to integer-keyed hashes by the shared `hash_arg_call` helper. + +use crate::codegen::builtins::arrays::hash_arg_call::emit_two_hash_arg_call; +use crate::codegen::context::Context; +use crate::codegen::data_section::DataSection; +use crate::codegen::emit::Emitter; +use crate::parser::ast::Expr; +use crate::types::PhpType; + +/// Emits code for the PHP `array_diff_assoc` / `array_intersect_assoc` builtins. +/// +/// Both compare entries of the first array against the second using **both** the key +/// and the string-cast value (`(string)$a === (string)$b`): +/// - `array_diff_assoc` keeps entries of `$array` whose (key, value) pair is absent from `$other`. +/// - `array_intersect_assoc` keeps entries whose (key, value) pair is present in `$other`. +/// +/// # Codegen +/// - Evaluates `args[0]` (first hash), spills it, evaluates `args[1]` (second hash). +/// - Loads both pointers and a mode selector (0 = diff, 1 = intersect) and calls +/// `__rt_assoc_diff_intersect`, which iterates the first hash, looks each key up in the +/// second, and string-compares the values, retaining kept entries for the result. +/// +/// # Returns +/// `Some(arr_ty)` — the first argument's array type (the result preserves its key space). +/// +/// # ABI +/// - AArch64: hash1 in `x0`, hash2 in `x1`, mode in `x2`; result hash in `x0`. +/// - x86_64: hash1 in `rdi`, hash2 in `rsi`, mode in `rdx`; result hash in `rax`. +pub fn emit( + name: &str, + args: &[Expr], + emitter: &mut Emitter, + ctx: &mut Context, + data: &mut DataSection, +) -> Option { + let intersect = name == "array_intersect_assoc"; + emitter.comment(if intersect { + "array_intersect_assoc()" + } else { + "array_diff_assoc()" + }); + let mode = if intersect { 1 } else { 0 }; + let (ty0, ty1) = emit_two_hash_arg_call( + args, + emitter, + ctx, + data, + "__rt_assoc_diff_intersect", + Some(mode), + ); + + // The runtime always produces a hash; key/value widen to Mixed when the two inputs disagree. + Some(PhpType::two_input_hash_result(&ty0, &ty1)) +} diff --git a/src/codegen/builtins/arrays/hash_arg_call.rs b/src/codegen/builtins/arrays/hash_arg_call.rs new file mode 100644 index 000000000..e1ea0d4ca --- /dev/null +++ b/src/codegen/builtins/arrays/hash_arg_call.rs @@ -0,0 +1,138 @@ +//! Purpose: +//! Shared two-array argument choreography for the hash-based array builtins +//! (`array_replace`, `array_replace_recursive`, `array_diff_assoc`, `array_intersect_assoc`, +//! `array_merge_recursive`). Accepts scalar indexed-array inputs by converting them to hashes. +//! +//! Called from: +//! - `crate::codegen::builtins::arrays::{array_replace, assoc_diff_intersect, array_merge_recursive}::emit()`. +//! +//! Key details: +//! - A scalar indexed input is converted to an owned integer-keyed hash via `__rt_array_to_hash`. +//! Converted temporaries are released with `__rt_decref_hash` after the runtime reads them; the +//! result (independently owned) is preserved across the frees. Scalar element values carry no +//! heap children, so freeing the converted temporaries cannot disturb the result. + +use crate::codegen::abi; +use crate::codegen::context::Context; +use crate::codegen::data_section::DataSection; +use crate::codegen::emit::Emitter; +use crate::codegen::expr::emit_expr; +use crate::codegen::platform::Arch; +use crate::parser::ast::Expr; +use crate::types::PhpType; + +/// Returns true if the type is an indexed array that the emitter must convert to a hash. +fn needs_conversion(ty: &PhpType) -> bool { + matches!(ty, PhpType::Array(_)) +} + +/// Converts the indexed array currently in the integer result register to an owned hash. +/// +/// `__rt_array_to_hash` takes its argument in the first argument register. On AArch64 the result +/// register `x0` already is that register, but on x86_64 the result lives in `rax`, so it must be +/// moved into `rdi` before the call. +fn emit_convert_indexed(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emitter.instruction("mov rdi, rax"); // move the array pointer into the first SysV argument register + } + abi::emit_call_label(emitter, "__rt_array_to_hash"); +} + +/// Emits the two-array argument choreography for a hash-based builtin and calls `runtime_label`. +/// +/// Evaluates both arguments in source order. Any indexed-array argument is converted to an owned +/// hash with `__rt_array_to_hash`. When `mode` is `Some`, its value is loaded into the third +/// runtime-argument register (used by `array_diff_assoc` / `array_intersect_assoc`). Converted +/// temporaries are released with `__rt_decref_hash` after the call, preserving the result. +/// +/// Leaves the result hash pointer in the integer result register and returns both arguments' +/// static types `(ty0, ty1)` so the caller can derive the builtin's result type (key/value +/// widening across the two inputs). +pub fn emit_two_hash_arg_call( + args: &[Expr], + emitter: &mut Emitter, + ctx: &mut Context, + data: &mut DataSection, + runtime_label: &str, + mode: Option, +) -> (PhpType, PhpType) { + let ty0 = emit_expr(&args[0], emitter, ctx, data); + let conv0 = needs_conversion(&ty0); + if conv0 { + emit_convert_indexed(emitter); // convert the indexed first argument to an owned hash + } + abi::emit_push_reg(emitter, abi::int_result_reg(emitter)); + + let ty1 = emit_expr(&args[1], emitter, ctx, data); + let conv1 = needs_conversion(&ty1); + if conv1 { + emit_convert_indexed(emitter); // convert the indexed second argument to an owned hash + } + + if !conv0 && !conv1 { + // Fast path: both inputs are already hashes, no temporaries to free. + match emitter.target.arch { + Arch::AArch64 => { + emitter.instruction("mov x1, x0"); // move the second hash pointer into the second runtime argument register + abi::emit_pop_reg(emitter, "x0"); + if let Some(m) = mode { + emitter.instruction(&format!("mov x2, #{}", m)); // mode selector into the third runtime argument register + } + } + Arch::X86_64 => { + emitter.instruction("mov rsi, rax"); // move the second hash pointer into the second SysV argument register + abi::emit_pop_reg(emitter, "rdi"); + if let Some(m) = mode { + emitter.instruction(&format!("mov edx, {}", m)); // mode selector into the third SysV argument register + } + } + } + abi::emit_call_label(emitter, runtime_label); + return (ty0, ty1); + } + + // Freeing path: at least one input was converted to a temporary hash that must be released. + abi::emit_push_reg(emitter, abi::int_result_reg(emitter)); // spill the second hash; stack holds [h2, h1] + match emitter.target.arch { + Arch::AArch64 => { + emitter.instruction("ldr x0, [sp, #16]"); // load the first hash pointer (kept on the stack for freeing) + emitter.instruction("ldr x1, [sp]"); // load the second hash pointer (kept on the stack for freeing) + if let Some(m) = mode { + emitter.instruction(&format!("mov x2, #{}", m)); // mode selector into the third runtime argument register + } + abi::emit_call_label(emitter, runtime_label); // result hash pointer returned in x0 + emitter.instruction("str x0, [sp, #-16]!"); // spill the result; stack holds [result, h2, h1] + if conv1 { + emitter.instruction("ldr x0, [sp, #16]"); // reload the converted second hash temporary + abi::emit_call_label(emitter, "__rt_decref_hash"); // release the converted second hash temporary + } + if conv0 { + emitter.instruction("ldr x0, [sp, #32]"); // reload the converted first hash temporary + abi::emit_call_label(emitter, "__rt_decref_hash"); // release the converted first hash temporary + } + emitter.instruction("ldr x0, [sp], #16"); // restore the result hash pointer + emitter.instruction("add sp, sp, #32"); // discard the two spilled input hash pointers + } + Arch::X86_64 => { + emitter.instruction("mov rdi, QWORD PTR [rsp + 16]"); // load the first hash pointer (kept on the stack for freeing) + emitter.instruction("mov rsi, QWORD PTR [rsp]"); // load the second hash pointer (kept on the stack for freeing) + if let Some(m) = mode { + emitter.instruction(&format!("mov edx, {}", m)); // mode selector into the third SysV argument register + } + abi::emit_call_label(emitter, runtime_label); // result hash pointer returned in rax + emitter.instruction("sub rsp, 16"); // reserve a slot for the result + emitter.instruction("mov QWORD PTR [rsp], rax"); // spill the result; stack holds [result, h2, h1] + if conv1 { + emitter.instruction("mov rdi, QWORD PTR [rsp + 16]"); // reload the converted second hash temporary + abi::emit_call_label(emitter, "__rt_decref_hash"); // release the converted second hash temporary + } + if conv0 { + emitter.instruction("mov rdi, QWORD PTR [rsp + 32]"); // reload the converted first hash temporary + abi::emit_call_label(emitter, "__rt_decref_hash"); // release the converted first hash temporary + } + emitter.instruction("mov rax, QWORD PTR [rsp]"); // restore the result hash pointer + emitter.instruction("add rsp, 48"); // discard the result slot and the two spilled inputs + } + } + (ty0, ty1) +} diff --git a/src/codegen/builtins/arrays/mod.rs b/src/codegen/builtins/arrays/mod.rs index 7335daf1d..e8eb87fb3 100644 --- a/src/codegen/builtins/arrays/mod.rs +++ b/src/codegen/builtins/arrays/mod.rs @@ -19,12 +19,20 @@ mod array_filter; mod array_flip; mod array_intersect; mod array_intersect_key; +mod array_is_list; +mod array_find_any_all; +mod array_udiff_uintersect; +mod assoc_diff_intersect; +mod array_key_edge; mod array_key_exists; pub(crate) mod array_keys; mod array_map; mod array_map_callback_returns_str; mod array_map_expr_is_str; mod array_merge; +mod array_merge_recursive; +mod array_multisort; +mod array_replace; mod array_pad; mod array_pop; mod array_product; @@ -53,6 +61,7 @@ mod count; pub(crate) mod descriptor_arg_builder; mod ensure_unique_arg; mod function_exists; +mod hash_arg_call; mod in_array; mod isset; mod krsort; @@ -105,10 +114,21 @@ pub fn emit( "array_shift" => array_shift::emit(name, args, emitter, ctx, data), "array_unshift" => array_unshift::emit(name, args, emitter, ctx, data), "array_merge" => array_merge::emit(name, args, emitter, ctx, data), + "array_replace" | "array_replace_recursive" => { + array_replace::emit(name, args, emitter, ctx, data) + } + "array_merge_recursive" => { + array_merge_recursive::emit(name, args, emitter, ctx, data) + } + "array_multisort" => array_multisort::emit(name, args, emitter, ctx, data), "array_slice" => array_slice::emit(name, args, emitter, ctx, data), "array_splice" => array_splice::emit(name, args, emitter, ctx, data), "array_combine" => array_combine::emit(name, args, emitter, ctx, data), "array_flip" => array_flip::emit(name, args, emitter, ctx, data), + "array_is_list" => array_is_list::emit(name, args, emitter, ctx, data), + "array_key_first" | "array_key_last" => { + array_key_edge::emit(name, args, emitter, ctx, data) + } "array_chunk" => array_chunk::emit(name, args, emitter, ctx, data), "array_column" => array_column::emit(name, args, emitter, ctx, data), "array_pad" => array_pad::emit(name, args, emitter, ctx, data), @@ -117,6 +137,9 @@ pub fn emit( "array_diff" => array_diff::emit(name, args, emitter, ctx, data), "array_intersect" => array_intersect::emit(name, args, emitter, ctx, data), "array_diff_key" => array_diff_key::emit(name, args, emitter, ctx, data), + "array_diff_assoc" | "array_intersect_assoc" => { + assoc_diff_intersect::emit(name, args, emitter, ctx, data) + } "array_intersect_key" => array_intersect_key::emit(name, args, emitter, ctx, data), "array_rand" => array_rand::emit(name, args, emitter, ctx, data), "shuffle" => shuffle_fn::emit(name, args, emitter, ctx, data), @@ -129,8 +152,16 @@ pub fn emit( "natcasesort" => natcasesort::emit(name, args, emitter, ctx, data), "array_map" => array_map::emit(name, args, emitter, ctx, data), "array_filter" => array_filter::emit(name, args, emitter, ctx, data), + "array_find" | "array_any" | "array_all" => { + array_find_any_all::emit(name, args, emitter, ctx, data) + } + "array_udiff" | "array_uintersect" => { + array_udiff_uintersect::emit(name, args, emitter, ctx, data) + } "array_reduce" => array_reduce::emit(name, args, emitter, ctx, data), - "array_walk" => array_walk::emit(name, args, emitter, ctx, data), + "array_walk" | "array_walk_recursive" => { + array_walk::emit(name, args, emitter, ctx, data) + } "buffer_free" => buffer_free::emit(name, args, emitter, ctx, data), "buffer_len" => buffer_len::emit(name, args, emitter, ctx, data), "usort" => usort::emit(name, args, emitter, ctx, data), diff --git a/src/codegen/functions/types/builtins.rs b/src/codegen/functions/types/builtins.rs index a310cc605..60903a04a 100644 --- a/src/codegen/functions/types/builtins.rs +++ b/src/codegen/functions/types/builtins.rs @@ -114,10 +114,29 @@ pub(super) fn infer_function_call_type( }, } } - "array_diff_key" | "array_intersect_key" => args - .first() - .map(|arg| infer_local_type(arg, sig, ctx)) - .unwrap_or_else(|| PhpType::Array(Box::new(PhpType::Int))), + "array_diff_key" | "array_intersect_key" | "array_replace" | "array_replace_recursive" + | "array_diff_assoc" | "array_intersect_assoc" => match (args.first(), args.get(1)) { + (Some(a0), Some(a1)) => PhpType::two_input_hash_result( + &infer_local_type(a0, sig, ctx), + &infer_local_type(a1, sig, ctx), + ), + (Some(a0), None) => infer_local_type(a0, sig, ctx).as_hash(), + _ => PhpType::Array(Box::new(PhpType::Int)), + }, + "array_merge_recursive" => { + let key = match (args.first(), args.get(1)) { + (Some(a0), Some(a1)) => PhpType::widen( + infer_local_type(a0, sig, ctx).hash_key_type(), + infer_local_type(a1, sig, ctx).hash_key_type(), + ), + (Some(a0), None) => infer_local_type(a0, sig, ctx).hash_key_type(), + _ => PhpType::Int, + }; + PhpType::AssocArray { + key: Box::new(key), + value: Box::new(PhpType::Mixed), + } + } "explode" | "str_split" | "file" @@ -132,6 +151,8 @@ pub(super) fn infer_function_call_type( | "array_fill" | "array_diff" | "array_intersect" + | "array_udiff" + | "array_uintersect" | "array_splice" | "array_column" | "array_map" @@ -169,7 +190,8 @@ pub(super) fn infer_function_call_type( "is_callable" | "is_int" | "is_float" | "is_string" | "is_bool" | "is_null" | "is_numeric" | "is_nan" | "is_finite" | "is_infinite" | "is_array" | "empty" | "isset" | "is_file" | "is_dir" | "is_readable" | "is_writable" | "file_exists" - | "in_array" | "array_key_exists" | "str_contains" | "str_starts_with" + | "in_array" | "array_key_exists" | "array_is_list" | "array_any" | "array_all" + | "array_multisort" | "str_contains" | "str_starts_with" | "str_ends_with" | "ctype_alpha" | "ctype_digit" | "ctype_alnum" | "ctype_space" | "function_exists" | "chmod" | "chown" | "chgrp" | "touch" | "ftruncate" | "fflush" | "fsync" | "fdatasync" | "ptr_is_null" @@ -182,7 +204,8 @@ pub(super) fn infer_function_call_type( | "fputcsv" => PhpType::Int, "strpos" | "strrpos" | "array_search" | "file_get_contents" | "json_encode" | "fileatime" | "filectime" | "fileperms" | "fileowner" | "filegroup" | "fileinode" - | "filetype" | "stat" | "lstat" | "fstat" | "fgetc" | "readfile" | "readlink" => PhpType::Mixed, + | "filetype" | "stat" | "lstat" | "fstat" | "fgetc" | "readfile" | "readlink" + | "array_key_first" | "array_key_last" | "array_find" => PhpType::Mixed, "fopen" | "tmpfile" => merge_union_members(vec![PhpType::stream_resource(), PhpType::Bool]), "pathinfo" => infer_pathinfo_type(args), "abs" => { diff --git a/src/codegen/runtime/arrays/array_edge_key.rs b/src/codegen/runtime/arrays/array_edge_key.rs new file mode 100644 index 000000000..f1a4c831e --- /dev/null +++ b/src/codegen/runtime/arrays/array_edge_key.rs @@ -0,0 +1,174 @@ +//! Purpose: +//! Emits the `__rt_array_edge_key` runtime helper assembly for array_key_first / array_key_last. +//! Returns the first or last key of a PHP array boxed as a Mixed cell. +//! +//! Called from: +//! - `crate::codegen::runtime::emitters::emit_runtime()` via `crate::codegen::runtime::arrays`. +//! +//! Key details: +//! - The key is boxed through `__rt_mixed_from_value` (tail call); empty containers yield a boxed null. + +use crate::codegen::emit::Emitter; +use crate::codegen::platform::Arch; + +/// array_edge_key: box the first or last key of a container as a Mixed cell. +/// Input: x0 = container pointer (indexed array, hash, or boxed mixed cell) +/// x1 = which (0 = first key, 1 = last key) +/// Output: x0 = boxed Mixed key, or boxed null when the container is empty / not an array +/// +/// Tail-calls `__rt_mixed_from_value` so the boxed result is returned directly to the +/// original caller. Integer keys box with tag 0, string keys with tag 1 (the string is +/// persisted by the box helper), and empty/non-array inputs box with the null tag 8. +pub fn emit_array_edge_key(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_array_edge_key_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: array_edge_key ---"); + emitter.label_global("__rt_array_edge_key"); + emitter.instruction("ldr x9, [x0, #-8]"); // load the uniform heap-kind header word + emitter.instruction("and x9, x9, #0xff"); // isolate the low-byte heap kind + emitter.instruction("cmp x9, #5"); // is the container a boxed mixed cell? + emitter.instruction("b.eq __rt_array_edge_key_mixed"); // mixed cells are unwrapped first + emitter.instruction("cmp x9, #2"); // is the container an indexed array? + emitter.instruction("b.eq __rt_array_edge_key_indexed"); // indexed arrays use positional keys + emitter.instruction("cmp x9, #3"); // is the container an associative hash? + emitter.instruction("b.eq __rt_array_edge_key_hash"); // hashes read the head/tail slot + emitter.instruction("b __rt_array_edge_key_null"); // any other kind has no key + emitter.label("__rt_array_edge_key_indexed"); + emitter.instruction("ldr x10, [x0, #0]"); // x10 = element count from the array header + emitter.instruction("cbz x10, __rt_array_edge_key_null"); // empty arrays have no key + emitter.instruction("cbz x1, __rt_array_edge_key_idx_first"); // which == 0 selects the first index + emitter.instruction("sub x10, x10, #1"); // last index = element count - 1 + emitter.instruction("mov x1, x10"); // value_lo = last index + emitter.instruction("mov x0, #0"); // value_tag = 0 (integer) + emitter.instruction("mov x2, #0"); // value_hi unused for integers + emitter.instruction("b __rt_mixed_from_value"); // box the integer key and return it to the caller + emitter.label("__rt_array_edge_key_idx_first"); + emitter.instruction("mov x0, #0"); // value_tag = 0 (integer) + emitter.instruction("mov x1, #0"); // value_lo = first index 0 + emitter.instruction("mov x2, #0"); // value_hi unused for integers + emitter.instruction("b __rt_mixed_from_value"); // box the integer key and return it to the caller + emitter.label("__rt_array_edge_key_hash"); + emitter.instruction("cbz x1, __rt_array_edge_key_hash_head"); // which == 0 selects the insertion-order head + emitter.instruction("ldr x11, [x0, #32]"); // x11 = tail slot index + emitter.instruction("b __rt_array_edge_key_hash_slot"); // load the selected entry + emitter.label("__rt_array_edge_key_hash_head"); + emitter.instruction("ldr x11, [x0, #24]"); // x11 = head slot index + emitter.label("__rt_array_edge_key_hash_slot"); + emitter.instruction("cmn x11, #1"); // is the selected slot empty (index == -1)? + emitter.instruction("b.eq __rt_array_edge_key_null"); // empty hashes have no key + emitter.instruction("mov x12, #64"); // hash entry stride in bytes + emitter.instruction("mul x12, x11, x12"); // byte offset of the selected slot + emitter.instruction("add x12, x0, x12"); // advance from the hash base to the slot + emitter.instruction("add x12, x12, #40"); // skip the 40-byte hash header + emitter.instruction("ldr x13, [x12, #16]"); // x13 = key_len (-1 marks an integer key) + emitter.instruction("ldr x14, [x12, #8]"); // x14 = key payload (integer value or string pointer) + emitter.instruction("cmn x13, #1"); // is the entry keyed by an integer? + emitter.instruction("b.eq __rt_array_edge_key_int"); // integer keys box with tag 0 + emitter.instruction("mov x0, #1"); // value_tag = 1 (string) + emitter.instruction("mov x1, x14"); // value_lo = key string pointer + emitter.instruction("mov x2, x13"); // value_hi = key string length + emitter.instruction("b __rt_mixed_from_value"); // box (and persist) the string key and return it + emitter.label("__rt_array_edge_key_int"); + emitter.instruction("mov x0, #0"); // value_tag = 0 (integer) + emitter.instruction("mov x1, x14"); // value_lo = integer key + emitter.instruction("mov x2, #0"); // value_hi unused for integers + emitter.instruction("b __rt_mixed_from_value"); // box the integer key and return it to the caller + emitter.label("__rt_array_edge_key_mixed"); + emitter.instruction("ldr x9, [x0]"); // load the boxed mixed value tag + emitter.instruction("cmp x9, #4"); // does the cell box an indexed array? + emitter.instruction("b.eq __rt_array_edge_key_unwrap"); // unwrap indexed array payloads + emitter.instruction("cmp x9, #5"); // does the cell box an associative array? + emitter.instruction("b.eq __rt_array_edge_key_unwrap"); // unwrap associative array payloads + emitter.instruction("b __rt_array_edge_key_null"); // non-array mixed payloads have no key + emitter.label("__rt_array_edge_key_unwrap"); + emitter.instruction("ldr x0, [x0, #8]"); // unbox the container pointer from mixed[8] + emitter.instruction("b __rt_array_edge_key"); // re-dispatch with the same which selector in x1 + emitter.label("__rt_array_edge_key_null"); + emitter.instruction("mov x0, #8"); // value_tag = 8 (null) + emitter.instruction("movz x1, #0xFFFE"); // value_lo = null sentinel bits [15:0] + emitter.instruction("movk x1, #0xFFFF, lsl #16"); // value_lo = null sentinel bits [31:16] + emitter.instruction("movk x1, #0xFFFF, lsl #32"); // value_lo = null sentinel bits [47:32] + emitter.instruction("movk x1, #0x7FFF, lsl #48"); // value_lo = null sentinel bits [63:48] = 0x7FFFFFFFFFFFFFFE + emitter.instruction("mov x2, #0"); // value_hi unused + emitter.instruction("b __rt_mixed_from_value"); // box the null sentinel and return it to the caller +} + +/// x86_64 Linux implementation of `__rt_array_edge_key`. +/// Input: rdi = container pointer, rsi = which (0 = first, 1 = last) +/// Output: rax = boxed Mixed key (tail-call result of `__rt_mixed_from_value`) +fn emit_array_edge_key_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: array_edge_key ---"); + emitter.label_global("__rt_array_edge_key"); + emitter.instruction("movzx eax, BYTE PTR [rdi - 8]"); // load the low-byte heap kind from the uniform header + emitter.instruction("cmp eax, 5"); // is the container a boxed mixed cell? + emitter.instruction("je __rt_array_edge_key_mixed"); // mixed cells are unwrapped first + emitter.instruction("cmp eax, 2"); // is the container an indexed array? + emitter.instruction("je __rt_array_edge_key_indexed"); // indexed arrays use positional keys + emitter.instruction("cmp eax, 3"); // is the container an associative hash? + emitter.instruction("je __rt_array_edge_key_hash"); // hashes read the head/tail slot + emitter.instruction("jmp __rt_array_edge_key_null"); // any other kind has no key + emitter.label("__rt_array_edge_key_indexed"); + emitter.instruction("mov r10, QWORD PTR [rdi]"); // r10 = element count from the array header + emitter.instruction("test r10, r10"); // is the array empty? + emitter.instruction("je __rt_array_edge_key_null"); // empty arrays have no key + emitter.instruction("test rsi, rsi"); // which == 0 selects the first index? + emitter.instruction("je __rt_array_edge_key_idx_first"); // box the first index + emitter.instruction("sub r10, 1"); // last index = element count - 1 + emitter.instruction("mov rdi, r10"); // value_lo = last index + emitter.instruction("xor esi, esi"); // value_hi unused for integers + emitter.instruction("mov rax, 0"); // value_tag = 0 (integer) + emitter.instruction("jmp __rt_mixed_from_value"); // box the integer key and return it to the caller + emitter.label("__rt_array_edge_key_idx_first"); + emitter.instruction("xor edi, edi"); // value_lo = first index 0 + emitter.instruction("xor esi, esi"); // value_hi unused for integers + emitter.instruction("mov rax, 0"); // value_tag = 0 (integer) + emitter.instruction("jmp __rt_mixed_from_value"); // box the integer key and return it to the caller + emitter.label("__rt_array_edge_key_hash"); + emitter.instruction("test rsi, rsi"); // which == 0 selects the insertion-order head? + emitter.instruction("je __rt_array_edge_key_hash_head"); // load the head slot + emitter.instruction("mov r11, QWORD PTR [rdi + 32]"); // r11 = tail slot index + emitter.instruction("jmp __rt_array_edge_key_hash_slot"); // load the selected entry + emitter.label("__rt_array_edge_key_hash_head"); + emitter.instruction("mov r11, QWORD PTR [rdi + 24]"); // r11 = head slot index + emitter.label("__rt_array_edge_key_hash_slot"); + emitter.instruction("cmp r11, -1"); // is the selected slot empty (index == -1)? + emitter.instruction("je __rt_array_edge_key_null"); // empty hashes have no key + emitter.instruction("mov rcx, r11"); // copy the slot index before scaling it + emitter.instruction("shl rcx, 6"); // convert the slot index into a 64-byte entry offset + emitter.instruction("add rcx, rdi"); // advance from the hash base to the slot + emitter.instruction("add rcx, 40"); // skip the 40-byte hash header + emitter.instruction("mov r8, QWORD PTR [rcx + 16]"); // r8 = key_len (-1 marks an integer key) + emitter.instruction("mov r9, QWORD PTR [rcx + 8]"); // r9 = key payload (integer value or string pointer) + emitter.instruction("cmp r8, -1"); // is the entry keyed by an integer? + emitter.instruction("je __rt_array_edge_key_int"); // integer keys box with tag 0 + emitter.instruction("mov rdi, r9"); // value_lo = key string pointer + emitter.instruction("mov rsi, r8"); // value_hi = key string length + emitter.instruction("mov rax, 1"); // value_tag = 1 (string) + emitter.instruction("jmp __rt_mixed_from_value"); // box (and persist) the string key and return it + emitter.label("__rt_array_edge_key_int"); + emitter.instruction("mov rdi, r9"); // value_lo = integer key + emitter.instruction("xor esi, esi"); // value_hi unused for integers + emitter.instruction("mov rax, 0"); // value_tag = 0 (integer) + emitter.instruction("jmp __rt_mixed_from_value"); // box the integer key and return it to the caller + emitter.label("__rt_array_edge_key_mixed"); + emitter.instruction("mov rax, QWORD PTR [rdi]"); // load the boxed mixed value tag + emitter.instruction("cmp rax, 4"); // does the cell box an indexed array? + emitter.instruction("je __rt_array_edge_key_unwrap"); // unwrap indexed array payloads + emitter.instruction("cmp rax, 5"); // does the cell box an associative array? + emitter.instruction("je __rt_array_edge_key_unwrap"); // unwrap associative array payloads + emitter.instruction("jmp __rt_array_edge_key_null"); // non-array mixed payloads have no key + emitter.label("__rt_array_edge_key_unwrap"); + emitter.instruction("mov rdi, QWORD PTR [rdi + 8]"); // unbox the container pointer from mixed[8] + emitter.instruction("jmp __rt_array_edge_key"); // re-dispatch with the same which selector in rsi + emitter.label("__rt_array_edge_key_null"); + emitter.instruction("mov rdi, 0x7ffffffffffffffe"); // value_lo = shared null sentinel + emitter.instruction("xor esi, esi"); // value_hi unused + emitter.instruction("mov rax, 8"); // value_tag = 8 (null) + emitter.instruction("jmp __rt_mixed_from_value"); // box the null sentinel and return it to the caller +} + diff --git a/src/codegen/runtime/arrays/array_find_any_all.rs b/src/codegen/runtime/arrays/array_find_any_all.rs new file mode 100644 index 000000000..230254f4d --- /dev/null +++ b/src/codegen/runtime/arrays/array_find_any_all.rs @@ -0,0 +1,196 @@ +//! Purpose: +//! Emits the `__rt_array_find_any_all` runtime helper for array_find / array_any / array_all. +//! Walks an indexed array, invoking the predicate callback on each element and returning per the mode. +//! +//! Called from: +//! - `crate::codegen::runtime::emitters::emit_runtime()` via `crate::codegen::runtime::arrays`. +//! +//! Key details: +//! - mode 0 = find (boxed first matching element or null), 1 = any (bool), 2 = all (bool); scalar elements only (8-byte). + +use crate::codegen::emit::Emitter; +use crate::codegen::platform::Arch; + +/// array_find_any_all: predicate-driven search over an indexed array. +/// Input: x0 = callback address, x1 = array pointer, x2 = optional environment, x3 = mode +/// Output: x0 = boxed Mixed (find: first match or null), or 1/0 (any/all) +/// +/// mode 0 (find) returns the first element where the callback is truthy, boxed as a Mixed +/// value using the array value_type, or a boxed null when none match. mode 1 (any) returns 1 +/// if any element is truthy. mode 2 (all) returns 1 only if every element is truthy. The +/// callback receives `(element [, env])`; element scalars are read as a single word. +pub fn emit_array_find_any_all(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_array_find_any_all_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: array_find_any_all ---"); + emitter.label_global("__rt_array_find_any_all"); + emitter.instruction("sub sp, sp, #80"); // allocate the find/any/all stack frame + emitter.instruction("stp x29, x30, [sp, #64]"); // save frame pointer and return address + emitter.instruction("add x29, sp, #64"); // set up the new frame pointer + emitter.instruction("stp x19, x20, [sp, #48]"); // save callee-saved callback address and environment + emitter.instruction("str x21, [sp, #40]"); // save callee-saved element register + emitter.instruction("mov x19, x0"); // x19 = callback address (callee-saved) + emitter.instruction("mov x20, x2"); // x20 = optional environment (callee-saved) + emitter.instruction("str x1, [sp, #0]"); // save the array pointer + emitter.instruction("str x3, [sp, #8]"); // save the mode selector + emitter.instruction("ldr x9, [x1]"); // load the array length + emitter.instruction("str x9, [sp, #16]"); // save the array length + emitter.instruction("ldr x9, [x1, #-8]"); // load the uniform heap-kind header word + emitter.instruction("lsr x9, x9, #8"); // shift the packed value_type into the low bits + emitter.instruction("and x9, x9, #0x7f"); // isolate the indexed-array value_type (also the Mixed tag) + emitter.instruction("str x9, [sp, #24]"); // save the element value_type for find boxing + emitter.instruction("str xzr, [sp, #32]"); // index i = 0 + emitter.label("__rt_array_find_any_all_loop"); + emitter.instruction("ldr x9, [sp, #16]"); // reload the length + emitter.instruction("ldr x10, [sp, #32]"); // reload the index + emitter.instruction("cmp x10, x9"); // has the index reached the length? + emitter.instruction("b.ge __rt_array_find_any_all_end"); // no element matched / all visited + emitter.instruction("ldr x1, [sp, #0]"); // reload the array pointer + emitter.instruction("add x1, x1, #24"); // skip the 24-byte indexed-array header + emitter.instruction("ldr x21, [x1, x10, lsl #3]"); // load element[i] into the callee-saved element register + emitter.instruction("mov x0, x21"); // pass the element as the first callback argument + emitter.instruction("cbz x20, __rt_array_find_any_all_call"); // no environment keeps the one-argument callback ABI + emitter.instruction("mov x1, x20"); // pass the environment as the second callback argument + emitter.label("__rt_array_find_any_all_call"); + emitter.instruction("blr x19"); // call the predicate callback; truthy result in x0 + emitter.instruction("ldr x11, [sp, #8]"); // reload the mode selector + emitter.instruction("cbz x11, __rt_array_find_any_all_find"); // mode 0 is find + emitter.instruction("cmp x11, #1"); // mode 1 is any? + emitter.instruction("b.eq __rt_array_find_any_all_any"); // handle the any mode + emitter.instruction("cbz x0, __rt_array_find_any_all_zero"); // all mode: a falsy element returns 0 + emitter.instruction("b __rt_array_find_any_all_next"); // all mode: keep checking + emitter.label("__rt_array_find_any_all_any"); + emitter.instruction("cbnz x0, __rt_array_find_any_all_one"); // any mode: a truthy element returns 1 + emitter.instruction("b __rt_array_find_any_all_next"); // any mode: keep checking + emitter.label("__rt_array_find_any_all_find"); + emitter.instruction("cbz x0, __rt_array_find_any_all_next"); // find mode: skip falsy elements + emitter.instruction("ldr x0, [sp, #24]"); // value_type tag for boxing the found element + emitter.instruction("mov x1, x21"); // found element low word + emitter.instruction("mov x2, #0"); // found element high word unused + emitter.instruction("bl __rt_mixed_from_value"); // box the found element as a Mixed value + emitter.instruction("b __rt_array_find_any_all_ret"); // return the boxed element + emitter.label("__rt_array_find_any_all_next"); + emitter.instruction("ldr x10, [sp, #32]"); // reload the index + emitter.instruction("add x10, x10, #1"); // advance to the next element + emitter.instruction("str x10, [sp, #32]"); // save the advanced index + emitter.instruction("b __rt_array_find_any_all_loop"); // continue the predicate loop + emitter.label("__rt_array_find_any_all_end"); + emitter.instruction("ldr x11, [sp, #8]"); // reload the mode selector + emitter.instruction("cbz x11, __rt_array_find_any_all_findnull"); // find found nothing: return boxed null + emitter.instruction("cmp x11, #1"); // any mode? + emitter.instruction("b.eq __rt_array_find_any_all_zero"); // any matched nothing: return 0 + emitter.instruction("b __rt_array_find_any_all_one"); // all elements passed: return 1 + emitter.label("__rt_array_find_any_all_findnull"); + emitter.instruction("mov x0, #8"); // value_tag 8 = null + emitter.instruction("movz x1, #0xFFFE"); // null sentinel bits [15:0] + emitter.instruction("movk x1, #0xFFFF, lsl #16"); // null sentinel bits [31:16] + emitter.instruction("movk x1, #0xFFFF, lsl #32"); // null sentinel bits [47:32] + emitter.instruction("movk x1, #0x7FFF, lsl #48"); // null sentinel bits [63:48] = 0x7FFFFFFFFFFFFFFE + emitter.instruction("mov x2, #0"); // value high word unused + emitter.instruction("bl __rt_mixed_from_value"); // box the null sentinel + emitter.instruction("b __rt_array_find_any_all_ret"); // return the boxed null + emitter.label("__rt_array_find_any_all_one"); + emitter.instruction("mov x0, #1"); // boolean result: true + emitter.instruction("b __rt_array_find_any_all_ret"); // return the boolean result + emitter.label("__rt_array_find_any_all_zero"); + emitter.instruction("mov x0, #0"); // boolean result: false + emitter.label("__rt_array_find_any_all_ret"); + emitter.instruction("ldr x21, [sp, #40]"); // restore the callee-saved element register + emitter.instruction("ldp x19, x20, [sp, #48]"); // restore callee-saved callback address and environment + emitter.instruction("ldp x29, x30, [sp, #64]"); // restore frame pointer and return address + emitter.instruction("add sp, sp, #80"); // deallocate the stack frame + emitter.instruction("ret"); // return the result in x0 +} + +/// x86_64 Linux implementation of `__rt_array_find_any_all`. +/// Input: rdi = callback, rsi = array pointer, rdx = optional environment, rcx = mode +/// Output: rax = boxed Mixed (find) or 1/0 (any/all) +fn emit_array_find_any_all_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: array_find_any_all ---"); + emitter.label_global("__rt_array_find_any_all"); + emitter.instruction("push rbp"); // preserve the caller frame pointer + emitter.instruction("mov rbp, rsp"); // establish a stable frame base + emitter.instruction("push r12"); // preserve the callback address across the loop + emitter.instruction("push r13"); // preserve the element register across the callback + emitter.instruction("push r14"); // preserve the environment across the callback + emitter.instruction("sub rsp, 40"); // reserve local slots for array/mode/length/value_type/index + emitter.instruction("mov r12, rdi"); // r12 = callback address (callee-saved) + emitter.instruction("mov r14, rdx"); // r14 = optional environment (callee-saved) + emitter.instruction("mov QWORD PTR [rbp - 32], rsi"); // save the array pointer + emitter.instruction("mov QWORD PTR [rbp - 40], rcx"); // save the mode selector + emitter.instruction("mov rax, QWORD PTR [rsi]"); // load the array length + emitter.instruction("mov QWORD PTR [rbp - 48], rax"); // save the array length + emitter.instruction("mov rax, QWORD PTR [rsi - 8]"); // load the uniform heap-kind header word + emitter.instruction("shr rax, 8"); // shift the packed value_type into the low bits + emitter.instruction("and rax, 127"); // isolate the indexed-array value_type (also the Mixed tag) + emitter.instruction("mov QWORD PTR [rbp - 56], rax"); // save the element value_type for find boxing + emitter.instruction("mov QWORD PTR [rbp - 64], 0"); // index i = 0 + emitter.label("__rt_array_find_any_all_loop"); + emitter.instruction("mov rax, QWORD PTR [rbp - 64]"); // reload the index + emitter.instruction("cmp rax, QWORD PTR [rbp - 48]"); // has the index reached the length? + emitter.instruction("jge __rt_array_find_any_all_end"); // no element matched / all visited + emitter.instruction("mov r10, QWORD PTR [rbp - 32]"); // reload the array pointer + emitter.instruction("mov r13, QWORD PTR [r10 + rax * 8 + 24]"); // load element[i] into the callee-saved element register + emitter.instruction("mov rdi, r13"); // pass the element as the first callback argument + emitter.instruction("test r14, r14"); // is an environment present? + emitter.instruction("jz __rt_array_find_any_all_call"); // no environment keeps the one-argument callback ABI + emitter.instruction("mov rsi, r14"); // pass the environment as the second callback argument + emitter.label("__rt_array_find_any_all_call"); + emitter.instruction("call r12"); // call the predicate callback; truthy result in rax + emitter.instruction("mov r11, QWORD PTR [rbp - 40]"); // reload the mode selector + emitter.instruction("test r11, r11"); // mode 0 is find? + emitter.instruction("jz __rt_array_find_any_all_find"); // handle the find mode + emitter.instruction("cmp r11, 1"); // mode 1 is any? + emitter.instruction("je __rt_array_find_any_all_any"); // handle the any mode + emitter.instruction("test rax, rax"); // all mode: is this element falsy? + emitter.instruction("jz __rt_array_find_any_all_zero"); // all mode: a falsy element returns 0 + emitter.instruction("jmp __rt_array_find_any_all_next"); // all mode: keep checking + emitter.label("__rt_array_find_any_all_any"); + emitter.instruction("test rax, rax"); // any mode: is this element truthy? + emitter.instruction("jnz __rt_array_find_any_all_one"); // any mode: a truthy element returns 1 + emitter.instruction("jmp __rt_array_find_any_all_next"); // any mode: keep checking + emitter.label("__rt_array_find_any_all_find"); + emitter.instruction("test rax, rax"); // find mode: is this element truthy? + emitter.instruction("jz __rt_array_find_any_all_next"); // find mode: skip falsy elements + emitter.instruction("mov rax, QWORD PTR [rbp - 56]"); // value_type tag for boxing the found element + emitter.instruction("mov rdi, r13"); // found element low word + emitter.instruction("xor esi, esi"); // found element high word unused + emitter.instruction("call __rt_mixed_from_value"); // box the found element as a Mixed value + emitter.instruction("jmp __rt_array_find_any_all_ret"); // return the boxed element + emitter.label("__rt_array_find_any_all_next"); + emitter.instruction("mov rax, QWORD PTR [rbp - 64]"); // reload the index + emitter.instruction("add rax, 1"); // advance to the next element + emitter.instruction("mov QWORD PTR [rbp - 64], rax"); // save the advanced index + emitter.instruction("jmp __rt_array_find_any_all_loop"); // continue the predicate loop + emitter.label("__rt_array_find_any_all_end"); + emitter.instruction("mov r11, QWORD PTR [rbp - 40]"); // reload the mode selector + emitter.instruction("test r11, r11"); // find mode found nothing? + emitter.instruction("jz __rt_array_find_any_all_findnull"); // find returns boxed null + emitter.instruction("cmp r11, 1"); // any mode? + emitter.instruction("je __rt_array_find_any_all_zero"); // any matched nothing: return 0 + emitter.instruction("jmp __rt_array_find_any_all_one"); // all elements passed: return 1 + emitter.label("__rt_array_find_any_all_findnull"); + emitter.instruction("mov rdi, 0x7ffffffffffffffe"); // value low word = shared null sentinel + emitter.instruction("xor esi, esi"); // value high word unused + emitter.instruction("mov rax, 8"); // value_tag 8 = null + emitter.instruction("call __rt_mixed_from_value"); // box the null sentinel + emitter.instruction("jmp __rt_array_find_any_all_ret"); // return the boxed null + emitter.label("__rt_array_find_any_all_one"); + emitter.instruction("mov rax, 1"); // boolean result: true + emitter.instruction("jmp __rt_array_find_any_all_ret"); // return the boolean result + emitter.label("__rt_array_find_any_all_zero"); + emitter.instruction("xor eax, eax"); // boolean result: false + emitter.label("__rt_array_find_any_all_ret"); + emitter.instruction("add rsp, 40"); // release the local slots + emitter.instruction("pop r14"); // restore the environment register + emitter.instruction("pop r13"); // restore the element register + emitter.instruction("pop r12"); // restore the callback register + emitter.instruction("pop rbp"); // restore the caller frame pointer + emitter.instruction("ret"); // return the result in rax +} + diff --git a/src/codegen/runtime/arrays/array_is_list.rs b/src/codegen/runtime/arrays/array_is_list.rs new file mode 100644 index 000000000..33d28f776 --- /dev/null +++ b/src/codegen/runtime/arrays/array_is_list.rs @@ -0,0 +1,125 @@ +//! Purpose: +//! Emits the `__rt_array_is_list` runtime helper assembly for array_is_list. +//! Determines whether a PHP array value has sequential integer keys 0..n-1 in insertion order. +//! +//! Called from: +//! - `crate::codegen::runtime::emitters::emit_runtime()` via `crate::codegen::runtime::arrays`. +//! +//! Key details: +//! - Indexed arrays are always lists; hashes are walked through the insertion-order chain; boxed mixed cells are unwrapped once. + +use crate::codegen::emit::Emitter; +use crate::codegen::platform::Arch; + +/// array_is_list: report whether a container is a PHP list (keys 0..n-1 in order). +/// Input: x0 = container pointer (indexed array, hash, or boxed mixed cell) +/// Output: x0 = 1 when the value is a list, 0 otherwise +/// +/// Indexed arrays are always lists. Hash tables are walked along the insertion-order +/// chain, requiring every key to be the integer matching its position. Boxed mixed +/// cells holding an array payload are unwrapped once and re-dispatched. +pub fn emit_array_is_list(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_array_is_list_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: array_is_list ---"); + emitter.label_global("__rt_array_is_list"); + emitter.instruction("ldr x9, [x0, #-8]"); // load the uniform heap-kind header word + emitter.instruction("and x9, x9, #0xff"); // isolate the low-byte heap kind + emitter.instruction("cmp x9, #2"); // is the value an indexed array? + emitter.instruction("b.eq __rt_array_is_list_one"); // indexed arrays are always lists + emitter.instruction("cmp x9, #5"); // is the value a boxed mixed cell? + emitter.instruction("b.eq __rt_array_is_list_mixed"); // mixed cells must be unwrapped before inspection + emitter.instruction("cmp x9, #3"); // is the value an associative hash? + emitter.instruction("b.ne __rt_array_is_list_zero"); // any other kind is not a PHP array + emitter.comment("-- walk the hash insertion-order chain checking keys 0,1,2,... --"); + emitter.instruction("mov x10, #0"); // expected next integer key starts at 0 + emitter.instruction("ldr x11, [x0, #24]"); // x11 = head slot index from the hash header + emitter.label("__rt_array_is_list_loop"); + emitter.instruction("cmn x11, #1"); // has the insertion chain reached its end (slot == -1)? + emitter.instruction("b.eq __rt_array_is_list_one"); // all keys matched 0..n-1 in order, including the empty hash + emitter.instruction("mov x12, #64"); // hash entry stride in bytes + emitter.instruction("mul x12, x11, x12"); // byte offset of the current slot + emitter.instruction("add x12, x0, x12"); // advance from the hash base to the slot + emitter.instruction("add x12, x12, #40"); // skip the 40-byte hash header + emitter.instruction("ldr x13, [x12, #16]"); // x13 = key_len (-1 marks an integer key) + emitter.instruction("cmn x13, #1"); // is this entry keyed by an integer? + emitter.instruction("b.ne __rt_array_is_list_zero"); // a string key cannot appear in a list + emitter.instruction("ldr x14, [x12, #8]"); // x14 = integer key payload + emitter.instruction("cmp x14, x10"); // does the key equal the expected position? + emitter.instruction("b.ne __rt_array_is_list_zero"); // a gap or reorder breaks list shape + emitter.instruction("add x10, x10, #1"); // advance the expected position + emitter.instruction("ldr x11, [x12, #56]"); // x11 = next slot index in insertion order + emitter.instruction("b __rt_array_is_list_loop"); // continue checking the next entry + emitter.label("__rt_array_is_list_mixed"); + emitter.instruction("ldr x9, [x0]"); // load the boxed mixed value tag + emitter.instruction("cmp x9, #4"); // does the cell box an indexed array? + emitter.instruction("b.eq __rt_array_is_list_unwrap"); // unwrap indexed array payloads + emitter.instruction("cmp x9, #5"); // does the cell box an associative array? + emitter.instruction("b.eq __rt_array_is_list_unwrap"); // unwrap associative array payloads + emitter.instruction("b __rt_array_is_list_zero"); // non-array mixed payloads are not lists + emitter.label("__rt_array_is_list_unwrap"); + emitter.instruction("ldr x0, [x0, #8]"); // unbox the container pointer from mixed[8] + emitter.instruction("b __rt_array_is_list"); // re-dispatch on the unboxed container + emitter.label("__rt_array_is_list_one"); + emitter.instruction("mov x0, #1"); // result: the value is a list + emitter.instruction("ret"); // return to caller + emitter.label("__rt_array_is_list_zero"); + emitter.instruction("mov x0, #0"); // result: the value is not a list + emitter.instruction("ret"); // return to caller +} + +/// x86_64 Linux implementation of `__rt_array_is_list`. +/// Input: rdi = container pointer +/// Output: rax = 1 when the value is a list, 0 otherwise +fn emit_array_is_list_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: array_is_list ---"); + emitter.label_global("__rt_array_is_list"); + emitter.instruction("movzx eax, BYTE PTR [rdi - 8]"); // load the low-byte heap kind from the uniform header + emitter.instruction("cmp eax, 2"); // is the value an indexed array? + emitter.instruction("je __rt_array_is_list_one"); // indexed arrays are always lists + emitter.instruction("cmp eax, 5"); // is the value a boxed mixed cell? + emitter.instruction("je __rt_array_is_list_mixed"); // mixed cells must be unwrapped before inspection + emitter.instruction("cmp eax, 3"); // is the value an associative hash? + emitter.instruction("jne __rt_array_is_list_zero"); // any other kind is not a PHP array + emitter.comment("-- walk the hash insertion-order chain checking keys 0,1,2,... --"); + emitter.instruction("xor r10, r10"); // expected next integer key starts at 0 + emitter.instruction("mov r11, QWORD PTR [rdi + 24]"); // r11 = head slot index from the hash header + emitter.label("__rt_array_is_list_loop"); + emitter.instruction("cmp r11, -1"); // has the insertion chain reached its end? + emitter.instruction("je __rt_array_is_list_one"); // all keys matched 0..n-1 in order, including the empty hash + emitter.instruction("mov rcx, r11"); // copy the slot index before scaling it + emitter.instruction("shl rcx, 6"); // convert the slot index into a 64-byte entry offset + emitter.instruction("add rcx, rdi"); // advance from the hash base to the slot + emitter.instruction("add rcx, 40"); // skip the 40-byte hash header + emitter.instruction("mov r8, QWORD PTR [rcx + 16]"); // r8 = key_len (-1 marks an integer key) + emitter.instruction("cmp r8, -1"); // is this entry keyed by an integer? + emitter.instruction("jne __rt_array_is_list_zero"); // a string key cannot appear in a list + emitter.instruction("mov r9, QWORD PTR [rcx + 8]"); // r9 = integer key payload + emitter.instruction("cmp r9, r10"); // does the key equal the expected position? + emitter.instruction("jne __rt_array_is_list_zero"); // a gap or reorder breaks list shape + emitter.instruction("add r10, 1"); // advance the expected position + emitter.instruction("mov r11, QWORD PTR [rcx + 56]"); // r11 = next slot index in insertion order + emitter.instruction("jmp __rt_array_is_list_loop"); // continue checking the next entry + emitter.label("__rt_array_is_list_mixed"); + emitter.instruction("mov rax, QWORD PTR [rdi]"); // load the boxed mixed value tag + emitter.instruction("cmp rax, 4"); // does the cell box an indexed array? + emitter.instruction("je __rt_array_is_list_unwrap"); // unwrap indexed array payloads + emitter.instruction("cmp rax, 5"); // does the cell box an associative array? + emitter.instruction("je __rt_array_is_list_unwrap"); // unwrap associative array payloads + emitter.instruction("jmp __rt_array_is_list_zero"); // non-array mixed payloads are not lists + emitter.label("__rt_array_is_list_unwrap"); + emitter.instruction("mov rdi, QWORD PTR [rdi + 8]"); // unbox the container pointer from mixed[8] + emitter.instruction("jmp __rt_array_is_list"); // re-dispatch on the unboxed container + emitter.label("__rt_array_is_list_one"); + emitter.instruction("mov rax, 1"); // result: the value is a list + emitter.instruction("ret"); // return to caller + emitter.label("__rt_array_is_list_zero"); + emitter.instruction("xor rax, rax"); // result: the value is not a list + emitter.instruction("ret"); // return to caller +} + diff --git a/src/codegen/runtime/arrays/array_merge_recursive.rs b/src/codegen/runtime/arrays/array_merge_recursive.rs new file mode 100644 index 000000000..f0d4299b8 --- /dev/null +++ b/src/codegen/runtime/arrays/array_merge_recursive.rs @@ -0,0 +1,465 @@ +//! Purpose: +//! Emits the `__rt_amr_box_value` and `__rt_array_merge_recursive` runtime helpers for array_merge_recursive. +//! Merges two associative arrays, recursing on array-valued key collisions and combining scalar collisions into lists. +//! +//! Called from: +//! - `crate::codegen::runtime::emitters::emit_runtime()` via `crate::codegen::runtime::arrays`. +//! +//! Key details: +//! - Integer keys append with renumbering; string-key collisions recurse (both assoc) or wrap-and-merge; temporaries are released. + +use crate::codegen::emit::Emitter; +use crate::codegen::platform::Arch; + +/// amr_box_value: wrap a single runtime value into a new one-element list hash {0: value}. +/// Input: x0 = value tag, x1 = value low word, x2 = value high word +/// Output: x0 = new owned hash whose only entry is integer key 0 -> the (retained) value +pub fn emit_amr_box_value(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_amr_box_value_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: amr_box_value ---"); + emitter.label_global("__rt_amr_box_value"); + emitter.instruction("sub sp, sp, #48"); // allocate the box-value stack frame + emitter.instruction("stp x29, x30, [sp, #32]"); // save frame pointer and return address + emitter.instruction("add x29, sp, #32"); // set up the new frame pointer + emitter.instruction("str x0, [sp, #0]"); // save the value tag + emitter.instruction("str x1, [sp, #8]"); // save the value low word + emitter.instruction("str x2, [sp, #16]"); // save the value high word + emitter.instruction("cmp x0, #1"); // is the value a string? + emitter.instruction("b.eq __rt_amr_box_value_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp x0, #4"); // is the value below the heap-backed tag range? + emitter.instruction("b.lt __rt_amr_box_value_new"); // scalar values need no retain + emitter.instruction("cmp x0, #7"); // is the value above the heap-backed tag range? + emitter.instruction("b.gt __rt_amr_box_value_new"); // non-heap tags need no retain + emitter.instruction("ldr x0, [sp, #8]"); // load the heap-backed value low word + emitter.instruction("bl __rt_incref"); // retain the heap-backed value for the wrapper hash + emitter.instruction("b __rt_amr_box_value_new"); // continue to wrapper allocation + emitter.label("__rt_amr_box_value_persist"); + emitter.instruction("ldr x1, [sp, #8]"); // string pointer to persist + emitter.instruction("ldr x2, [sp, #16]"); // string length to persist + emitter.instruction("bl __rt_str_persist"); // copy the string to an independent heap block, x1 = new pointer + emitter.instruction("str x1, [sp, #8]"); // store the persisted string pointer as the wrapper value + emitter.instruction("str x2, [sp, #16]"); // store the persisted string length + emitter.label("__rt_amr_box_value_new"); + emitter.instruction("mov x0, #8"); // initial capacity for the wrapper hash + emitter.instruction("mov x1, #7"); // value_type 7 = mixed + emitter.instruction("bl __rt_hash_new"); // create the wrapper hash, x0 = wrapper + emitter.instruction("mov x1, #0"); // integer key 0 for the single entry + emitter.instruction("mov x2, #-1"); // key_hi = -1 marks an integer key + emitter.instruction("ldr x3, [sp, #8]"); // value low word + emitter.instruction("ldr x4, [sp, #16]"); // value high word + emitter.instruction("ldr x5, [sp, #0]"); // value runtime tag + emitter.instruction("bl __rt_hash_set"); // insert the value at key 0, x0 = wrapper (maybe realloc) + emitter.instruction("ldp x29, x30, [sp, #32]"); // restore frame pointer and return address + emitter.instruction("add sp, sp, #48"); // deallocate the stack frame + emitter.instruction("ret"); // return the wrapper hash in x0 +} + +/// x86_64 Linux implementation of `__rt_amr_box_value`. +/// Input: rdi = value tag, rsi = value low word, rdx = value high word +/// Output: rax = new owned one-element list hash {0: value} +fn emit_amr_box_value_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: amr_box_value ---"); + emitter.label_global("__rt_amr_box_value"); + emitter.instruction("push rbp"); // preserve the caller frame pointer + emitter.instruction("mov rbp, rsp"); // establish a stable frame base + emitter.instruction("sub rsp, 32"); // reserve local slots for the value triple + emitter.instruction("mov QWORD PTR [rbp - 8], rdi"); // save the value tag + emitter.instruction("mov QWORD PTR [rbp - 16], rsi"); // save the value low word + emitter.instruction("mov QWORD PTR [rbp - 24], rdx"); // save the value high word + emitter.instruction("cmp rdi, 1"); // is the value a string? + emitter.instruction("je __rt_amr_box_value_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp rdi, 4"); // is the value below the heap-backed tag range? + emitter.instruction("jl __rt_amr_box_value_new"); // scalar values need no retain + emitter.instruction("cmp rdi, 7"); // is the value above the heap-backed tag range? + emitter.instruction("jg __rt_amr_box_value_new"); // non-heap tags need no retain + emitter.instruction("mov rdi, QWORD PTR [rbp - 16]"); // load the heap-backed value low word + emitter.instruction("call __rt_incref"); // retain the heap-backed value for the wrapper hash + emitter.instruction("jmp __rt_amr_box_value_new"); // continue to wrapper allocation + emitter.label("__rt_amr_box_value_persist"); + emitter.instruction("mov rax, QWORD PTR [rbp - 16]"); // string pointer to persist + emitter.instruction("mov rdx, QWORD PTR [rbp - 24]"); // string length to persist + emitter.instruction("call __rt_str_persist"); // copy the string to an independent heap block, rax = new pointer + emitter.instruction("mov QWORD PTR [rbp - 16], rax"); // store the persisted string pointer as the wrapper value + emitter.instruction("mov QWORD PTR [rbp - 24], rdx"); // store the persisted string length + emitter.label("__rt_amr_box_value_new"); + emitter.instruction("mov rdi, 8"); // initial capacity for the wrapper hash + emitter.instruction("mov rsi, 7"); // value_type 7 = mixed + emitter.instruction("call __rt_hash_new"); // create the wrapper hash, rax = wrapper + emitter.instruction("mov rdi, rax"); // wrapper hash pointer for hash_set + emitter.instruction("mov rsi, 0"); // integer key 0 for the single entry + emitter.instruction("mov rdx, -1"); // key_hi = -1 marks an integer key + emitter.instruction("mov rcx, QWORD PTR [rbp - 16]"); // value low word + emitter.instruction("mov r8, QWORD PTR [rbp - 24]"); // value high word + emitter.instruction("mov r9, QWORD PTR [rbp - 8]"); // value runtime tag + emitter.instruction("call __rt_hash_set"); // insert the value at key 0, rax = wrapper (maybe realloc) + emitter.instruction("add rsp, 32"); // release the local slots + emitter.instruction("pop rbp"); // restore the caller frame pointer + emitter.instruction("ret"); // return the wrapper hash in rax +} + +/// array_merge_recursive: PHP-style recursive merge of two associative arrays. +/// Input: x0 = first hash, x1 = second hash +/// Output: x0 = new owned merged hash +/// +/// Integer-keyed entries from both inputs append with sequential renumbering. String keys +/// that collide recurse when both values are associative arrays, otherwise each value is +/// wrapped to a list and merged (combining scalars). Kept values are retained; wrapper +/// temporaries are released. Nested indexed-array values are treated as opaque (wrapped). +pub fn emit_array_merge_recursive(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_array_merge_recursive_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: array_merge_recursive ---"); + emitter.label_global("__rt_array_merge_recursive"); + emitter.instruction("sub sp, sp, #192"); // allocate the merge-recursive stack frame + emitter.instruction("stp x29, x30, [sp, #176]"); // save frame pointer and return address + emitter.instruction("add x29, sp, #176"); // set up the new frame pointer + emitter.instruction("str x0, [sp, #0]"); // save first hash pointer + emitter.instruction("str x1, [sp, #8]"); // save second hash pointer + emitter.instruction("mov x0, #16"); // initial capacity for the result hash + emitter.instruction("mov x1, #7"); // value_type 7 = mixed + emitter.instruction("bl __rt_hash_new"); // create the result hash, x0 = result + emitter.instruction("str x0, [sp, #16]"); // save the result hash pointer + emitter.instruction("str xzr, [sp, #24]"); // next integer key counter = 0 + emitter.instruction("str xzr, [sp, #32]"); // source selector which = 0 + emitter.label("__rt_amr_which_loop"); + emitter.instruction("ldr x9, [sp, #32]"); // reload the source selector + emitter.instruction("cmp x9, #2"); // have both sources been processed? + emitter.instruction("b.ge __rt_amr_done"); // finish once both inputs are merged + emitter.instruction("cbz x9, __rt_amr_pick_a"); // selector 0 chooses the first input + emitter.instruction("ldr x10, [sp, #8]"); // selector 1 chooses the second input + emitter.instruction("b __rt_amr_pick_done"); // store the chosen source + emitter.label("__rt_amr_pick_a"); + emitter.instruction("ldr x10, [sp, #0]"); // load the first input as the current source + emitter.label("__rt_amr_pick_done"); + emitter.instruction("str x10, [sp, #48]"); // save the current source pointer + emitter.instruction("str xzr, [sp, #40]"); // iterator cursor = 0 + emitter.label("__rt_amr_entry_loop"); + emitter.instruction("ldr x0, [sp, #48]"); // reload the current source pointer + emitter.instruction("ldr x1, [sp, #40]"); // reload the iterator cursor + emitter.instruction("bl __rt_hash_iter_next"); // next entry: x0=cursor,x1=kptr,x2=klen,x3=vlo,x4=vhi,x5=vtag + emitter.instruction("cmn x0, #1"); // has iteration reached the end? + emitter.instruction("b.eq __rt_amr_next_which"); // advance to the next source when done + emitter.instruction("str x0, [sp, #40]"); // save the next iterator cursor + emitter.instruction("str x1, [sp, #56]"); // save key pointer + emitter.instruction("str x2, [sp, #64]"); // save key length + emitter.instruction("str x3, [sp, #72]"); // save value low word + emitter.instruction("str x4, [sp, #80]"); // save value high word + emitter.instruction("str x5, [sp, #88]"); // save value runtime tag + emitter.instruction("cmn x2, #1"); // is this an integer key (key length == -1)? + emitter.instruction("b.eq __rt_amr_int_key"); // integer keys append with renumbering + emitter.comment("-- string key: look it up in the result --"); + emitter.instruction("ldr x0, [sp, #16]"); // result hash pointer + emitter.instruction("ldr x1, [sp, #56]"); // key pointer + emitter.instruction("ldr x2, [sp, #64]"); // key length + emitter.instruction("bl __rt_hash_get"); // look up the key: x0=found,x1=e_lo,x2=e_hi,x3=e_tag + emitter.instruction("cbz x0, __rt_amr_str_new"); // absent key is added directly + emitter.instruction("str x1, [sp, #96]"); // save existing value low word + emitter.instruction("str x2, [sp, #104]"); // save existing value high word + emitter.instruction("str x3, [sp, #112]"); // save existing value runtime tag + emitter.comment("-- build the existing operand (keep assoc arrays, wrap others) --"); + emitter.instruction("cmp x3, #5"); // is the existing value an associative array? + emitter.instruction("b.ne __rt_amr_ea_wrap"); // wrap non-assoc existing values into a list + emitter.instruction("str x1, [sp, #120]"); // keep the associative array as the existing operand + emitter.instruction("str xzr, [sp, #136]"); // existing operand is borrowed (not newly created) + emitter.instruction("b __rt_amr_na"); // build the new operand next + emitter.label("__rt_amr_ea_wrap"); + emitter.instruction("ldr x0, [sp, #112]"); // existing value tag + emitter.instruction("ldr x1, [sp, #96]"); // existing value low word + emitter.instruction("ldr x2, [sp, #104]"); // existing value high word + emitter.instruction("bl __rt_amr_box_value"); // wrap the existing value into a list, x0 = wrapper + emitter.instruction("str x0, [sp, #120]"); // save the existing operand + emitter.instruction("mov x9, #1"); // mark the existing operand as newly created + emitter.instruction("str x9, [sp, #136]"); // save the existing-operand ownership flag + emitter.label("__rt_amr_na"); + emitter.instruction("ldr x3, [sp, #88]"); // reload the new value tag + emitter.instruction("cmp x3, #5"); // is the new value an associative array? + emitter.instruction("b.ne __rt_amr_na_wrap"); // wrap non-assoc new values into a list + emitter.instruction("ldr x9, [sp, #72]"); // reload the new value low word + emitter.instruction("str x9, [sp, #128]"); // keep the associative array as the new operand + emitter.instruction("str xzr, [sp, #144]"); // new operand is borrowed (not newly created) + emitter.instruction("b __rt_amr_merge"); // merge the two operands + emitter.label("__rt_amr_na_wrap"); + emitter.instruction("ldr x0, [sp, #88]"); // new value tag + emitter.instruction("ldr x1, [sp, #72]"); // new value low word + emitter.instruction("ldr x2, [sp, #80]"); // new value high word + emitter.instruction("bl __rt_amr_box_value"); // wrap the new value into a list, x0 = wrapper + emitter.instruction("str x0, [sp, #128]"); // save the new operand + emitter.instruction("mov x9, #1"); // mark the new operand as newly created + emitter.instruction("str x9, [sp, #144]"); // save the new-operand ownership flag + emitter.label("__rt_amr_merge"); + emitter.instruction("ldr x0, [sp, #120]"); // existing operand + emitter.instruction("ldr x1, [sp, #128]"); // new operand + emitter.instruction("bl __rt_array_merge_recursive"); // recursively merge the two operands, x0 = merged + emitter.instruction("mov x3, x0"); // merged hash becomes the new value low word + emitter.instruction("ldr x0, [sp, #16]"); // result hash pointer + emitter.instruction("ldr x1, [sp, #56]"); // key pointer + emitter.instruction("ldr x2, [sp, #64]"); // key length + emitter.instruction("mov x4, #0"); // array values carry no high word + emitter.instruction("mov x5, #5"); // value tag 5 = associative array + emitter.instruction("bl __rt_hash_set"); // store the merged value (releases the previous value) + emitter.instruction("str x0, [sp, #16]"); // update the result pointer after possible reallocation + emitter.instruction("ldr x9, [sp, #136]"); // reload the existing-operand ownership flag + emitter.instruction("cbz x9, __rt_amr_free_na"); // skip releasing a borrowed existing operand + emitter.instruction("ldr x0, [sp, #120]"); // load the newly created existing operand + emitter.instruction("bl __rt_decref_hash"); // release the existing-operand wrapper + emitter.label("__rt_amr_free_na"); + emitter.instruction("ldr x9, [sp, #144]"); // reload the new-operand ownership flag + emitter.instruction("cbz x9, __rt_amr_entry_loop"); // skip releasing a borrowed new operand + emitter.instruction("ldr x0, [sp, #128]"); // load the newly created new operand + emitter.instruction("bl __rt_decref_hash"); // release the new-operand wrapper + emitter.instruction("b __rt_amr_entry_loop"); // continue with the next entry + emitter.label("__rt_amr_str_new"); + emitter.instruction("ldr x9, [sp, #88]"); // reload the value runtime tag + emitter.instruction("cmp x9, #1"); // is the value a string? + emitter.instruction("b.eq __rt_amr_str_new_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp x9, #4"); // is the value below the heap-backed tag range? + emitter.instruction("b.lt __rt_amr_str_new_set"); // scalar values need no retain + emitter.instruction("cmp x9, #7"); // is the value above the heap-backed tag range? + emitter.instruction("b.gt __rt_amr_str_new_set"); // non-heap tags need no retain + emitter.instruction("ldr x0, [sp, #72]"); // load the heap-backed value low word + emitter.instruction("bl __rt_incref"); // retain the heap-backed value for the result + emitter.instruction("b __rt_amr_str_new_set"); // continue to the insertion + emitter.label("__rt_amr_str_new_persist"); + emitter.instruction("ldr x1, [sp, #72]"); // string pointer to persist + emitter.instruction("ldr x2, [sp, #80]"); // string length to persist + emitter.instruction("bl __rt_str_persist"); // copy the string into an independent heap block, x1 = new pointer + emitter.instruction("str x1, [sp, #72]"); // store the persisted string pointer + emitter.instruction("str x2, [sp, #80]"); // store the persisted string length + emitter.label("__rt_amr_str_new_set"); + emitter.instruction("ldr x0, [sp, #16]"); // result hash pointer + emitter.instruction("ldr x1, [sp, #56]"); // key pointer + emitter.instruction("ldr x2, [sp, #64]"); // key length + emitter.instruction("ldr x3, [sp, #72]"); // value low word + emitter.instruction("ldr x4, [sp, #80]"); // value high word + emitter.instruction("ldr x5, [sp, #88]"); // value runtime tag + emitter.instruction("bl __rt_hash_set"); // insert the new string-keyed entry + emitter.instruction("str x0, [sp, #16]"); // update the result pointer after possible reallocation + emitter.instruction("b __rt_amr_entry_loop"); // continue with the next entry + emitter.label("__rt_amr_int_key"); + emitter.instruction("ldr x9, [sp, #88]"); // reload the value runtime tag + emitter.instruction("cmp x9, #1"); // is the value a string? + emitter.instruction("b.eq __rt_amr_int_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp x9, #4"); // is the value below the heap-backed tag range? + emitter.instruction("b.lt __rt_amr_int_set"); // scalar values need no retain + emitter.instruction("cmp x9, #7"); // is the value above the heap-backed tag range? + emitter.instruction("b.gt __rt_amr_int_set"); // non-heap tags need no retain + emitter.instruction("ldr x0, [sp, #72]"); // load the heap-backed value low word + emitter.instruction("bl __rt_incref"); // retain the heap-backed value for the result + emitter.instruction("b __rt_amr_int_set"); // continue to the insertion + emitter.label("__rt_amr_int_persist"); + emitter.instruction("ldr x1, [sp, #72]"); // string pointer to persist + emitter.instruction("ldr x2, [sp, #80]"); // string length to persist + emitter.instruction("bl __rt_str_persist"); // copy the string into an independent heap block, x1 = new pointer + emitter.instruction("str x1, [sp, #72]"); // store the persisted string pointer + emitter.instruction("str x2, [sp, #80]"); // store the persisted string length + emitter.label("__rt_amr_int_set"); + emitter.instruction("ldr x0, [sp, #16]"); // result hash pointer + emitter.instruction("ldr x1, [sp, #24]"); // next integer key + emitter.instruction("mov x2, #-1"); // key_hi = -1 marks an integer key + emitter.instruction("ldr x3, [sp, #72]"); // value low word + emitter.instruction("ldr x4, [sp, #80]"); // value high word + emitter.instruction("ldr x5, [sp, #88]"); // value runtime tag + emitter.instruction("bl __rt_hash_set"); // append the integer-keyed entry with the renumbered key + emitter.instruction("str x0, [sp, #16]"); // update the result pointer after possible reallocation + emitter.instruction("ldr x9, [sp, #24]"); // reload the next integer key counter + emitter.instruction("add x9, x9, #1"); // advance the integer key counter + emitter.instruction("str x9, [sp, #24]"); // save the advanced integer key counter + emitter.instruction("b __rt_amr_entry_loop"); // continue with the next entry + emitter.label("__rt_amr_next_which"); + emitter.instruction("ldr x9, [sp, #32]"); // reload the source selector + emitter.instruction("add x9, x9, #1"); // advance to the next source + emitter.instruction("str x9, [sp, #32]"); // save the advanced source selector + emitter.instruction("b __rt_amr_which_loop"); // process the next source + emitter.label("__rt_amr_done"); + emitter.instruction("ldr x0, [sp, #16]"); // x0 = result hash pointer + emitter.instruction("ldp x29, x30, [sp, #176]"); // restore frame pointer and return address + emitter.instruction("add sp, sp, #192"); // deallocate the stack frame + emitter.instruction("ret"); // return the merged hash in x0 +} + +/// x86_64 Linux implementation of `__rt_array_merge_recursive`. +/// Input: rdi = first hash, rsi = second hash +/// Output: rax = new owned merged hash +fn emit_array_merge_recursive_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: array_merge_recursive ---"); + emitter.label_global("__rt_array_merge_recursive"); + emitter.instruction("push rbp"); // preserve the caller frame pointer + emitter.instruction("mov rbp, rsp"); // establish a stable frame base + emitter.instruction("sub rsp, 160"); // reserve local spill slots for the merge state + emitter.instruction("mov QWORD PTR [rbp - 8], rdi"); // save first hash pointer + emitter.instruction("mov QWORD PTR [rbp - 16], rsi"); // save second hash pointer + emitter.instruction("mov rdi, 16"); // initial capacity for the result hash + emitter.instruction("mov rsi, 7"); // value_type 7 = mixed + emitter.instruction("call __rt_hash_new"); // create the result hash, rax = result + emitter.instruction("mov QWORD PTR [rbp - 24], rax"); // save the result hash pointer + emitter.instruction("mov QWORD PTR [rbp - 32], 0"); // next integer key counter = 0 + emitter.instruction("mov QWORD PTR [rbp - 40], 0"); // source selector which = 0 + emitter.label("__rt_amr_which_loop"); + emitter.instruction("mov rax, QWORD PTR [rbp - 40]"); // reload the source selector + emitter.instruction("cmp rax, 2"); // have both sources been processed? + emitter.instruction("jge __rt_amr_done"); // finish once both inputs are merged + emitter.instruction("test rax, rax"); // is the selector zero (first input)? + emitter.instruction("jne __rt_amr_pick_b"); // selector 1 chooses the second input + emitter.instruction("mov r10, QWORD PTR [rbp - 8]"); // load the first input as the current source + emitter.instruction("jmp __rt_amr_pick_done"); // store the chosen source + emitter.label("__rt_amr_pick_b"); + emitter.instruction("mov r10, QWORD PTR [rbp - 16]"); // load the second input as the current source + emitter.label("__rt_amr_pick_done"); + emitter.instruction("mov QWORD PTR [rbp - 56], r10"); // save the current source pointer + emitter.instruction("mov QWORD PTR [rbp - 48], 0"); // iterator cursor = 0 + emitter.label("__rt_amr_entry_loop"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 56]"); // reload the current source pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 48]"); // reload the iterator cursor + emitter.instruction("call __rt_hash_iter_next"); // next entry: rax=cursor,rdi=kptr,rdx=klen,rcx=vlo,r8=vhi,r9=vtag + emitter.instruction("cmp rax, -1"); // has iteration reached the end? + emitter.instruction("je __rt_amr_next_which"); // advance to the next source when done + emitter.instruction("mov QWORD PTR [rbp - 48], rax"); // save the next iterator cursor + emitter.instruction("mov QWORD PTR [rbp - 64], rdi"); // save key pointer + emitter.instruction("mov QWORD PTR [rbp - 72], rdx"); // save key length + emitter.instruction("mov QWORD PTR [rbp - 80], rcx"); // save value low word + emitter.instruction("mov QWORD PTR [rbp - 88], r8"); // save value high word + emitter.instruction("mov QWORD PTR [rbp - 96], r9"); // save value runtime tag + emitter.instruction("cmp rdx, -1"); // is this an integer key (key length == -1)? + emitter.instruction("je __rt_amr_int_key"); // integer keys append with renumbering + emitter.comment("-- string key: look it up in the result --"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 24]"); // result hash pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 64]"); // key pointer + emitter.instruction("mov rdx, QWORD PTR [rbp - 72]"); // key length + emitter.instruction("call __rt_hash_get"); // look up the key: rax=found,rdi=e_lo,rsi=e_hi,rcx=e_tag + emitter.instruction("test rax, rax"); // was the key already present? + emitter.instruction("jz __rt_amr_str_new"); // absent key is added directly + emitter.instruction("mov QWORD PTR [rbp - 104], rdi"); // save existing value low word + emitter.instruction("mov QWORD PTR [rbp - 112], rsi"); // save existing value high word + emitter.instruction("mov QWORD PTR [rbp - 120], rcx"); // save existing value runtime tag + emitter.comment("-- build the existing operand (keep assoc arrays, wrap others) --"); + emitter.instruction("cmp rcx, 5"); // is the existing value an associative array? + emitter.instruction("jne __rt_amr_ea_wrap"); // wrap non-assoc existing values into a list + emitter.instruction("mov QWORD PTR [rbp - 128], rdi"); // keep the associative array as the existing operand + emitter.instruction("mov QWORD PTR [rbp - 144], 0"); // existing operand is borrowed (not newly created) + emitter.instruction("jmp __rt_amr_na"); // build the new operand next + emitter.label("__rt_amr_ea_wrap"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 120]"); // existing value tag + emitter.instruction("mov rsi, QWORD PTR [rbp - 104]"); // existing value low word + emitter.instruction("mov rdx, QWORD PTR [rbp - 112]"); // existing value high word + emitter.instruction("call __rt_amr_box_value"); // wrap the existing value into a list, rax = wrapper + emitter.instruction("mov QWORD PTR [rbp - 128], rax"); // save the existing operand + emitter.instruction("mov QWORD PTR [rbp - 144], 1"); // mark the existing operand as newly created + emitter.label("__rt_amr_na"); + emitter.instruction("mov rcx, QWORD PTR [rbp - 96]"); // reload the new value tag + emitter.instruction("cmp rcx, 5"); // is the new value an associative array? + emitter.instruction("jne __rt_amr_na_wrap"); // wrap non-assoc new values into a list + emitter.instruction("mov rax, QWORD PTR [rbp - 80]"); // reload the new value low word + emitter.instruction("mov QWORD PTR [rbp - 136], rax"); // keep the associative array as the new operand + emitter.instruction("mov QWORD PTR [rbp - 152], 0"); // new operand is borrowed (not newly created) + emitter.instruction("jmp __rt_amr_merge"); // merge the two operands + emitter.label("__rt_amr_na_wrap"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 96]"); // new value tag + emitter.instruction("mov rsi, QWORD PTR [rbp - 80]"); // new value low word + emitter.instruction("mov rdx, QWORD PTR [rbp - 88]"); // new value high word + emitter.instruction("call __rt_amr_box_value"); // wrap the new value into a list, rax = wrapper + emitter.instruction("mov QWORD PTR [rbp - 136], rax"); // save the new operand + emitter.instruction("mov QWORD PTR [rbp - 152], 1"); // mark the new operand as newly created + emitter.label("__rt_amr_merge"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 128]"); // existing operand + emitter.instruction("mov rsi, QWORD PTR [rbp - 136]"); // new operand + emitter.instruction("call __rt_array_merge_recursive"); // recursively merge the two operands, rax = merged + emitter.instruction("mov rcx, rax"); // merged hash becomes the new value low word + emitter.instruction("mov rdi, QWORD PTR [rbp - 24]"); // result hash pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 64]"); // key pointer + emitter.instruction("mov rdx, QWORD PTR [rbp - 72]"); // key length + emitter.instruction("xor r8, r8"); // array values carry no high word + emitter.instruction("mov r9, 5"); // value tag 5 = associative array + emitter.instruction("call __rt_hash_set"); // store the merged value (releases the previous value) + emitter.instruction("mov QWORD PTR [rbp - 24], rax"); // update the result pointer after possible reallocation + emitter.instruction("mov rax, QWORD PTR [rbp - 144]"); // reload the existing-operand ownership flag + emitter.instruction("test rax, rax"); // was the existing operand newly created? + emitter.instruction("jz __rt_amr_free_na"); // skip releasing a borrowed existing operand + emitter.instruction("mov rdi, QWORD PTR [rbp - 128]"); // load the newly created existing operand + emitter.instruction("call __rt_decref_hash"); // release the existing-operand wrapper + emitter.label("__rt_amr_free_na"); + emitter.instruction("mov rax, QWORD PTR [rbp - 152]"); // reload the new-operand ownership flag + emitter.instruction("test rax, rax"); // was the new operand newly created? + emitter.instruction("jz __rt_amr_entry_loop"); // skip releasing a borrowed new operand + emitter.instruction("mov rdi, QWORD PTR [rbp - 136]"); // load the newly created new operand + emitter.instruction("call __rt_decref_hash"); // release the new-operand wrapper + emitter.instruction("jmp __rt_amr_entry_loop"); // continue with the next entry + emitter.label("__rt_amr_str_new"); + emitter.instruction("mov rax, QWORD PTR [rbp - 96]"); // reload the value runtime tag + emitter.instruction("cmp rax, 1"); // is the value a string? + emitter.instruction("je __rt_amr_str_new_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp rax, 4"); // is the value below the heap-backed tag range? + emitter.instruction("jl __rt_amr_str_new_set"); // scalar values need no retain + emitter.instruction("cmp rax, 7"); // is the value above the heap-backed tag range? + emitter.instruction("jg __rt_amr_str_new_set"); // non-heap tags need no retain + emitter.instruction("mov rdi, QWORD PTR [rbp - 80]"); // load the heap-backed value low word + emitter.instruction("call __rt_incref"); // retain the heap-backed value for the result + emitter.instruction("jmp __rt_amr_str_new_set"); // continue to the insertion + emitter.label("__rt_amr_str_new_persist"); + emitter.instruction("mov rax, QWORD PTR [rbp - 80]"); // string pointer to persist + emitter.instruction("mov rdx, QWORD PTR [rbp - 88]"); // string length to persist + emitter.instruction("call __rt_str_persist"); // copy the string into an independent heap block, rax = new pointer + emitter.instruction("mov QWORD PTR [rbp - 80], rax"); // store the persisted string pointer + emitter.instruction("mov QWORD PTR [rbp - 88], rdx"); // store the persisted string length + emitter.label("__rt_amr_str_new_set"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 24]"); // result hash pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 64]"); // key pointer + emitter.instruction("mov rdx, QWORD PTR [rbp - 72]"); // key length + emitter.instruction("mov rcx, QWORD PTR [rbp - 80]"); // value low word + emitter.instruction("mov r8, QWORD PTR [rbp - 88]"); // value high word + emitter.instruction("mov r9, QWORD PTR [rbp - 96]"); // value runtime tag + emitter.instruction("call __rt_hash_set"); // insert the new string-keyed entry + emitter.instruction("mov QWORD PTR [rbp - 24], rax"); // update the result pointer after possible reallocation + emitter.instruction("jmp __rt_amr_entry_loop"); // continue with the next entry + emitter.label("__rt_amr_int_key"); + emitter.instruction("mov rax, QWORD PTR [rbp - 96]"); // reload the value runtime tag + emitter.instruction("cmp rax, 1"); // is the value a string? + emitter.instruction("je __rt_amr_int_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp rax, 4"); // is the value below the heap-backed tag range? + emitter.instruction("jl __rt_amr_int_set"); // scalar values need no retain + emitter.instruction("cmp rax, 7"); // is the value above the heap-backed tag range? + emitter.instruction("jg __rt_amr_int_set"); // non-heap tags need no retain + emitter.instruction("mov rdi, QWORD PTR [rbp - 80]"); // load the heap-backed value low word + emitter.instruction("call __rt_incref"); // retain the heap-backed value for the result + emitter.instruction("jmp __rt_amr_int_set"); // continue to the insertion + emitter.label("__rt_amr_int_persist"); + emitter.instruction("mov rax, QWORD PTR [rbp - 80]"); // string pointer to persist + emitter.instruction("mov rdx, QWORD PTR [rbp - 88]"); // string length to persist + emitter.instruction("call __rt_str_persist"); // copy the string into an independent heap block, rax = new pointer + emitter.instruction("mov QWORD PTR [rbp - 80], rax"); // store the persisted string pointer + emitter.instruction("mov QWORD PTR [rbp - 88], rdx"); // store the persisted string length + emitter.label("__rt_amr_int_set"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 24]"); // result hash pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 32]"); // next integer key + emitter.instruction("mov rdx, -1"); // key_hi = -1 marks an integer key + emitter.instruction("mov rcx, QWORD PTR [rbp - 80]"); // value low word + emitter.instruction("mov r8, QWORD PTR [rbp - 88]"); // value high word + emitter.instruction("mov r9, QWORD PTR [rbp - 96]"); // value runtime tag + emitter.instruction("call __rt_hash_set"); // append the integer-keyed entry with the renumbered key + emitter.instruction("mov QWORD PTR [rbp - 24], rax"); // update the result pointer after possible reallocation + emitter.instruction("mov rax, QWORD PTR [rbp - 32]"); // reload the next integer key counter + emitter.instruction("add rax, 1"); // advance the integer key counter + emitter.instruction("mov QWORD PTR [rbp - 32], rax"); // save the advanced integer key counter + emitter.instruction("jmp __rt_amr_entry_loop"); // continue with the next entry + emitter.label("__rt_amr_next_which"); + emitter.instruction("mov rax, QWORD PTR [rbp - 40]"); // reload the source selector + emitter.instruction("add rax, 1"); // advance to the next source + emitter.instruction("mov QWORD PTR [rbp - 40], rax"); // save the advanced source selector + emitter.instruction("jmp __rt_amr_which_loop"); // process the next source + emitter.label("__rt_amr_done"); + emitter.instruction("mov rax, QWORD PTR [rbp - 24]"); // rax = result hash pointer + emitter.instruction("add rsp, 160"); // release the local spill slots + emitter.instruction("pop rbp"); // restore the caller frame pointer + emitter.instruction("ret"); // return the merged hash in rax +} + diff --git a/src/codegen/runtime/arrays/array_multisort.rs b/src/codegen/runtime/arrays/array_multisort.rs new file mode 100644 index 000000000..1c7944ec4 --- /dev/null +++ b/src/codegen/runtime/arrays/array_multisort.rs @@ -0,0 +1,103 @@ +//! Purpose: +//! Emits the `__rt_array_multisort` runtime helper for array_multisort over two parallel arrays. +//! Stable-sorts the first indexed array ascending and applies the same element moves to the second. +//! +//! Called from: +//! - `crate::codegen::runtime::emitters::emit_runtime()` via `crate::codegen::runtime::arrays`. +//! +//! Key details: +//! - Leaf helper (no calls); in-place tandem bubble sort; scalar (8-byte) elements, equal-length arrays. + +use crate::codegen::emit::Emitter; +use crate::codegen::platform::Arch; + +/// array_multisort: sort arr1 ascending in place, reordering arr2 in tandem. +/// Input: x0 = arr1 pointer (primary sort key), x1 = arr2 pointer (reordered to match) +/// Output: none (both arrays mutated in place) +/// +/// Stable tandem bubble sort: each time two adjacent arr1 elements are out of ascending +/// order they are swapped together with the corresponding arr2 elements. Uses arr1 length +/// for both arrays (PHP requires equal-length arrays). Scalar 8-byte elements only. +pub fn emit_array_multisort(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_array_multisort_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: array_multisort ---"); + emitter.label_global("__rt_array_multisort"); + emitter.instruction("ldr x9, [x0]"); // x9 = arr1 length (used for both arrays) + emitter.instruction("add x10, x0, #24"); // x10 = arr1 data base (skip header) + emitter.instruction("add x11, x1, #24"); // x11 = arr2 data base (skip header) + emitter.instruction("cmp x9, #2"); // arrays shorter than 2 elements are already sorted + emitter.instruction("b.lt __rt_array_multisort_done"); // nothing to sort + emitter.label("__rt_array_multisort_outer"); + emitter.instruction("mov x12, #0"); // swapped flag = 0 for this pass + emitter.instruction("mov x13, #0"); // inner index j = 0 + emitter.instruction("sub x14, x9, #1"); // x14 = length - 1 (last comparable index) + emitter.label("__rt_array_multisort_inner"); + emitter.instruction("cmp x13, x14"); // has j reached length - 1? + emitter.instruction("b.ge __rt_array_multisort_pass_end"); // end of this bubble pass + emitter.instruction("add x16, x13, #1"); // x16 = j + 1 + emitter.instruction("ldr x15, [x10, x13, lsl #3]"); // x15 = arr1[j] + emitter.instruction("ldr x17, [x10, x16, lsl #3]"); // x17 = arr1[j+1] + emitter.instruction("cmp x15, x17"); // is arr1[j] greater than arr1[j+1]? + emitter.instruction("b.le __rt_array_multisort_no_swap"); // already in ascending order (stable: keep equal pairs) + emitter.instruction("str x17, [x10, x13, lsl #3]"); // swap: arr1[j] = old arr1[j+1] + emitter.instruction("str x15, [x10, x16, lsl #3]"); // swap: arr1[j+1] = old arr1[j] + emitter.instruction("ldr x15, [x11, x13, lsl #3]"); // x15 = arr2[j] + emitter.instruction("ldr x17, [x11, x16, lsl #3]"); // x17 = arr2[j+1] + emitter.instruction("str x17, [x11, x13, lsl #3]"); // tandem swap: arr2[j] = old arr2[j+1] + emitter.instruction("str x15, [x11, x16, lsl #3]"); // tandem swap: arr2[j+1] = old arr2[j] + emitter.instruction("mov x12, #1"); // mark that a swap happened this pass + emitter.label("__rt_array_multisort_no_swap"); + emitter.instruction("add x13, x13, #1"); // advance the inner index + emitter.instruction("b __rt_array_multisort_inner"); // continue the bubble pass + emitter.label("__rt_array_multisort_pass_end"); + emitter.instruction("cbnz x12, __rt_array_multisort_outer"); // repeat passes until no swaps occur + emitter.label("__rt_array_multisort_done"); + emitter.instruction("ret"); // both arrays are sorted in place +} + +/// x86_64 Linux implementation of `__rt_array_multisort`. +/// Input: rdi = arr1 pointer, rsi = arr2 pointer +/// Output: none (both arrays mutated in place) +fn emit_array_multisort_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: array_multisort ---"); + emitter.label_global("__rt_array_multisort"); + emitter.instruction("mov r9, QWORD PTR [rdi]"); // r9 = arr1 length (used for both arrays) + emitter.instruction("lea r10, [rdi + 24]"); // r10 = arr1 data base (skip header) + emitter.instruction("lea r11, [rsi + 24]"); // r11 = arr2 data base (skip header) + emitter.instruction("cmp r9, 2"); // arrays shorter than 2 elements are already sorted + emitter.instruction("jl __rt_array_multisort_done"); // nothing to sort + emitter.label("__rt_array_multisort_outer"); + emitter.instruction("xor r8, r8"); // swapped flag = 0 for this pass + emitter.instruction("xor rax, rax"); // inner index j = 0 + emitter.label("__rt_array_multisort_inner"); + emitter.instruction("mov rcx, r9"); // copy the length + emitter.instruction("sub rcx, 1"); // rcx = length - 1 (last comparable index) + emitter.instruction("cmp rax, rcx"); // has j reached length - 1? + emitter.instruction("jge __rt_array_multisort_pass_end"); // end of this bubble pass + emitter.instruction("mov rcx, QWORD PTR [r10 + rax * 8]"); // rcx = arr1[j] + emitter.instruction("mov rdx, QWORD PTR [r10 + rax * 8 + 8]"); // rdx = arr1[j+1] + emitter.instruction("cmp rcx, rdx"); // is arr1[j] greater than arr1[j+1]? + emitter.instruction("jle __rt_array_multisort_no_swap"); // already in ascending order (stable: keep equal pairs) + emitter.instruction("mov QWORD PTR [r10 + rax * 8], rdx"); // swap: arr1[j] = old arr1[j+1] + emitter.instruction("mov QWORD PTR [r10 + rax * 8 + 8], rcx"); // swap: arr1[j+1] = old arr1[j] + emitter.instruction("mov rcx, QWORD PTR [r11 + rax * 8]"); // rcx = arr2[j] + emitter.instruction("mov rdx, QWORD PTR [r11 + rax * 8 + 8]"); // rdx = arr2[j+1] + emitter.instruction("mov QWORD PTR [r11 + rax * 8], rdx"); // tandem swap: arr2[j] = old arr2[j+1] + emitter.instruction("mov QWORD PTR [r11 + rax * 8 + 8], rcx"); // tandem swap: arr2[j+1] = old arr2[j] + emitter.instruction("mov r8, 1"); // mark that a swap happened this pass + emitter.label("__rt_array_multisort_no_swap"); + emitter.instruction("add rax, 1"); // advance the inner index + emitter.instruction("jmp __rt_array_multisort_inner"); // continue the bubble pass + emitter.label("__rt_array_multisort_pass_end"); + emitter.instruction("test r8, r8"); // did any swap happen this pass? + emitter.instruction("jnz __rt_array_multisort_outer"); // repeat passes until no swaps occur + emitter.label("__rt_array_multisort_done"); + emitter.instruction("ret"); // both arrays are sorted in place +} + diff --git a/src/codegen/runtime/arrays/array_replace.rs b/src/codegen/runtime/arrays/array_replace.rs new file mode 100644 index 000000000..966e7e5d1 --- /dev/null +++ b/src/codegen/runtime/arrays/array_replace.rs @@ -0,0 +1,139 @@ +//! Purpose: +//! Emits the `__rt_array_replace` runtime helper assembly for array_replace. +//! Clones the first associative array, then overwrites/appends every entry of the second (right-wins). +//! +//! Called from: +//! - `crate::codegen::runtime::emitters::emit_runtime()` via `crate::codegen::runtime::arrays`. +//! +//! Key details: +//! - Operates on hash tables; heap-backed and string values are retained for the cloned result before insertion. + +use crate::codegen::emit::Emitter; +use crate::codegen::platform::Arch; + +/// array_replace: replace/append entries of hash1 with entries of hash2 (later wins). +/// Input: x0 = first hash pointer, x1 = second hash pointer +/// Output: x0 = new owned hash pointer (clone of hash1 with hash2 entries inserted) +/// +/// hash1 is shallow-cloned (keys re-persisted, child values retained), then every +/// entry of hash2 is inserted via `__rt_hash_set`, which overwrites matching keys in +/// place (preserving their position) and appends new keys. Heap and string values from +/// hash2 are retained for the new owner before insertion. +pub fn emit_array_replace(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_array_replace_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: array_replace ---"); + emitter.label_global("__rt_array_replace"); + emitter.instruction("sub sp, sp, #80"); // allocate the array_replace stack frame + emitter.instruction("stp x29, x30, [sp, #64]"); // save frame pointer and return address + emitter.instruction("add x29, sp, #64"); // set up the new frame pointer + emitter.instruction("str x1, [sp, #0]"); // save the second hash pointer + emitter.instruction("bl __rt_hash_clone_shallow"); // clone hash1 into an owned result hash, x0 = result + emitter.instruction("str x0, [sp, #8]"); // save the cloned result hash pointer + emitter.instruction("str xzr, [sp, #16]"); // iterator cursor = 0 (start from hash2 head) + emitter.label("__rt_array_replace_loop"); + emitter.instruction("ldr x0, [sp, #0]"); // x0 = hash2 pointer + emitter.instruction("ldr x1, [sp, #16]"); // x1 = current iterator cursor + emitter.instruction("bl __rt_hash_iter_next"); // next hash2 entry: x0=cursor,x1=kptr,x2=klen,x3=vlo,x4=vhi,x5=vtag + emitter.instruction("cmn x0, #1"); // has iteration reached the end (cursor == -1)? + emitter.instruction("b.eq __rt_array_replace_done"); // stop once every hash2 entry has been inserted + emitter.instruction("str x0, [sp, #16]"); // save the next iterator cursor + emitter.instruction("str x1, [sp, #24]"); // save key pointer + emitter.instruction("str x2, [sp, #32]"); // save key length + emitter.instruction("str x3, [sp, #40]"); // save value low word + emitter.instruction("str x4, [sp, #48]"); // save value high word + emitter.instruction("str x5, [sp, #56]"); // save value runtime tag + emitter.instruction("cmp x5, #1"); // is the value a string? + emitter.instruction("b.eq __rt_array_replace_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp x5, #4"); // is the value below the heap-backed tag range? + emitter.instruction("b.lt __rt_array_replace_insert"); // scalar values need no retain + emitter.instruction("cmp x5, #7"); // is the value above the heap-backed tag range? + emitter.instruction("b.gt __rt_array_replace_insert"); // non-heap tags need no retain + emitter.instruction("ldr x0, [sp, #40]"); // load the heap-backed value pointer from the saved value low word + emitter.instruction("bl __rt_incref"); // retain the heap-backed value for the result hash owner + emitter.instruction("b __rt_array_replace_insert"); // continue to the insertion + emitter.label("__rt_array_replace_persist"); + emitter.instruction("ldr x1, [sp, #40]"); // string pointer to persist + emitter.instruction("ldr x2, [sp, #48]"); // string length to persist + emitter.instruction("bl __rt_str_persist"); // copy the string into an independent heap block, x1 = new pointer + emitter.instruction("str x1, [sp, #40]"); // store the persisted string pointer + emitter.instruction("str x2, [sp, #48]"); // store the persisted string length + emitter.label("__rt_array_replace_insert"); + emitter.instruction("ldr x0, [sp, #8]"); // x0 = result hash pointer + emitter.instruction("ldr x1, [sp, #24]"); // reload key pointer + emitter.instruction("ldr x2, [sp, #32]"); // reload key length + emitter.instruction("ldr x3, [sp, #40]"); // reload value low word + emitter.instruction("ldr x4, [sp, #48]"); // reload value high word + emitter.instruction("ldr x5, [sp, #56]"); // reload value runtime tag + emitter.instruction("bl __rt_hash_set"); // overwrite or append the entry into the result hash + emitter.instruction("str x0, [sp, #8]"); // update the result pointer after possible reallocation + emitter.instruction("b __rt_array_replace_loop"); // continue with the next hash2 entry + emitter.label("__rt_array_replace_done"); + emitter.instruction("ldr x0, [sp, #8]"); // x0 = result hash pointer + emitter.instruction("ldp x29, x30, [sp, #64]"); // restore frame pointer and return address + emitter.instruction("add sp, sp, #80"); // deallocate the stack frame + emitter.instruction("ret"); // return the result hash in x0 +} + +/// x86_64 Linux implementation of `__rt_array_replace`. +/// Input: rdi = first hash pointer, rsi = second hash pointer +/// Output: rax = new owned hash pointer (clone of hash1 with hash2 entries inserted) +fn emit_array_replace_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: array_replace ---"); + emitter.label_global("__rt_array_replace"); + emitter.instruction("push rbp"); // preserve the caller frame pointer + emitter.instruction("mov rbp, rsp"); // establish a stable frame base + emitter.instruction("sub rsp, 64"); // reserve local spill slots for the replace loop state + emitter.instruction("mov QWORD PTR [rbp - 8], rsi"); // save the second hash pointer + emitter.instruction("call __rt_hash_clone_shallow"); // clone hash1 into an owned result hash, rax = result + emitter.instruction("mov QWORD PTR [rbp - 16], rax"); // save the cloned result hash pointer + emitter.instruction("mov QWORD PTR [rbp - 24], 0"); // iterator cursor = 0 (start from hash2 head) + emitter.label("__rt_array_replace_loop"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 8]"); // rdi = hash2 pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 24]"); // rsi = current iterator cursor + emitter.instruction("call __rt_hash_iter_next"); // next hash2 entry: rax=cursor,rdi=kptr,rdx=klen,rcx=vlo,r8=vhi,r9=vtag + emitter.instruction("cmp rax, -1"); // has iteration reached the end? + emitter.instruction("je __rt_array_replace_done"); // stop once every hash2 entry has been inserted + emitter.instruction("mov QWORD PTR [rbp - 24], rax"); // save the next iterator cursor + emitter.instruction("mov QWORD PTR [rbp - 32], rdi"); // save key pointer + emitter.instruction("mov QWORD PTR [rbp - 40], rdx"); // save key length + emitter.instruction("mov QWORD PTR [rbp - 48], rcx"); // save value low word + emitter.instruction("mov QWORD PTR [rbp - 56], r8"); // save value high word + emitter.instruction("mov QWORD PTR [rbp - 64], r9"); // save value runtime tag + emitter.instruction("cmp r9, 1"); // is the value a string? + emitter.instruction("je __rt_array_replace_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp r9, 4"); // is the value below the heap-backed tag range? + emitter.instruction("jl __rt_array_replace_insert"); // scalar values need no retain + emitter.instruction("cmp r9, 7"); // is the value above the heap-backed tag range? + emitter.instruction("jg __rt_array_replace_insert"); // non-heap tags need no retain + emitter.instruction("mov rdi, QWORD PTR [rbp - 48]"); // load the heap-backed value pointer from the saved value low word + emitter.instruction("call __rt_incref"); // retain the heap-backed value for the result hash owner + emitter.instruction("jmp __rt_array_replace_insert"); // continue to the insertion + emitter.label("__rt_array_replace_persist"); + emitter.instruction("mov rax, QWORD PTR [rbp - 48]"); // string pointer to persist + emitter.instruction("mov rdx, QWORD PTR [rbp - 56]"); // string length to persist + emitter.instruction("call __rt_str_persist"); // copy the string into an independent heap block, rax = new pointer + emitter.instruction("mov QWORD PTR [rbp - 48], rax"); // store the persisted string pointer + emitter.instruction("mov QWORD PTR [rbp - 56], rdx"); // store the persisted string length + emitter.label("__rt_array_replace_insert"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 16]"); // rdi = result hash pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 32]"); // reload key pointer + emitter.instruction("mov rdx, QWORD PTR [rbp - 40]"); // reload key length + emitter.instruction("mov rcx, QWORD PTR [rbp - 48]"); // reload value low word + emitter.instruction("mov r8, QWORD PTR [rbp - 56]"); // reload value high word + emitter.instruction("mov r9, QWORD PTR [rbp - 64]"); // reload value runtime tag + emitter.instruction("call __rt_hash_set"); // overwrite or append the entry into the result hash + emitter.instruction("mov QWORD PTR [rbp - 16], rax"); // update the result pointer after possible reallocation + emitter.instruction("jmp __rt_array_replace_loop"); // continue with the next hash2 entry + emitter.label("__rt_array_replace_done"); + emitter.instruction("mov rax, QWORD PTR [rbp - 16]"); // rax = result hash pointer + emitter.instruction("add rsp, 64"); // release the local spill slots + emitter.instruction("pop rbp"); // restore the caller frame pointer + emitter.instruction("ret"); // return the result hash in rax +} + diff --git a/src/codegen/runtime/arrays/array_replace_recursive.rs b/src/codegen/runtime/arrays/array_replace_recursive.rs new file mode 100644 index 000000000..f77dc7caa --- /dev/null +++ b/src/codegen/runtime/arrays/array_replace_recursive.rs @@ -0,0 +1,190 @@ +//! Purpose: +//! Emits the `__rt_array_replace_recursive` runtime helper for array_replace_recursive. +//! Recursively merges hash2 into a clone of hash1, recursing when both values at a key are associative arrays. +//! +//! Called from: +//! - `crate::codegen::runtime::emitters::emit_runtime()` via `crate::codegen::runtime::arrays`. +//! +//! Key details: +//! - Self-recursive over nested associative arrays; `__rt_hash_set` releases overwritten values, keeping refcounts balanced. + +use crate::codegen::emit::Emitter; +use crate::codegen::platform::Arch; + +/// array_replace_recursive: deep right-wins merge of two associative arrays. +/// Input: x0 = hash1 pointer, x1 = hash2 pointer +/// Output: x0 = new owned hash pointer +/// +/// Clones hash1, then for every hash2 entry: if the key exists in hash1 and both values +/// are associative arrays (tag 5), recurses and stores the merged sub-array; otherwise the +/// hash2 value overwrites/appends (right-wins). `__rt_hash_set` releases the previous value +/// on overwrite, so the recursively cloned children stay refcount-balanced. +pub fn emit_array_replace_recursive(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_array_replace_recursive_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: array_replace_recursive ---"); + emitter.label_global("__rt_array_replace_recursive"); + emitter.instruction("sub sp, sp, #112"); // allocate the recursive-replace stack frame + emitter.instruction("stp x29, x30, [sp, #96]"); // save frame pointer and return address + emitter.instruction("add x29, sp, #96"); // set up the new frame pointer + emitter.instruction("str x0, [sp, #0]"); // save hash1 pointer + emitter.instruction("str x1, [sp, #8]"); // save hash2 pointer + emitter.instruction("bl __rt_hash_clone_shallow"); // clone hash1 into an owned result hash, x0 = result + emitter.instruction("str x0, [sp, #16]"); // save the result hash pointer + emitter.instruction("str xzr, [sp, #24]"); // iterator cursor = 0 (start from hash2 head) + emitter.label("__rt_array_replace_recursive_loop"); + emitter.instruction("ldr x0, [sp, #8]"); // x0 = hash2 pointer + emitter.instruction("ldr x1, [sp, #24]"); // x1 = current iterator cursor + emitter.instruction("bl __rt_hash_iter_next"); // next hash2 entry: x0=cursor,x1=kptr,x2=klen,x3=vlo,x4=vhi,x5=vtag + emitter.instruction("cmn x0, #1"); // has iteration reached the end (cursor == -1)? + emitter.instruction("b.eq __rt_array_replace_recursive_done"); // stop once every hash2 entry is merged + emitter.instruction("str x0, [sp, #24]"); // save the next iterator cursor + emitter.instruction("str x1, [sp, #32]"); // save key pointer + emitter.instruction("str x2, [sp, #40]"); // save key length + emitter.instruction("str x3, [sp, #48]"); // save hash2 value low word + emitter.instruction("str x4, [sp, #56]"); // save hash2 value high word + emitter.instruction("str x5, [sp, #64]"); // save hash2 value runtime tag + emitter.instruction("cmp x5, #5"); // is the hash2 value an associative array? + emitter.instruction("b.ne __rt_array_replace_recursive_over"); // non-array values overwrite directly + emitter.instruction("ldr x0, [sp, #0]"); // x0 = hash1 pointer + emitter.instruction("ldr x1, [sp, #32]"); // x1 = key low word + emitter.instruction("ldr x2, [sp, #40]"); // x2 = key high word (-1 marks an integer key) + emitter.instruction("bl __rt_hash_get"); // look up the key in hash1: x0=found,x1=vlo,x2=vhi,x3=vtag + emitter.instruction("cbz x0, __rt_array_replace_recursive_over"); // absent in hash1 means append, not recurse + emitter.instruction("cmp x3, #5"); // is the hash1 value also an associative array? + emitter.instruction("b.ne __rt_array_replace_recursive_over"); // only recurse when both values are arrays + emitter.instruction("str x1, [sp, #72]"); // save the hash1 nested array pointer + emitter.instruction("ldr x0, [sp, #72]"); // x0 = hash1 nested array (recursion arg1) + emitter.instruction("ldr x1, [sp, #48]"); // x1 = hash2 nested array (recursion arg2) + emitter.instruction("bl __rt_array_replace_recursive"); // recurse into the nested arrays, x0 = merged sub-array + emitter.instruction("mov x3, x0"); // merged sub-array becomes the new value low word + emitter.instruction("ldr x0, [sp, #16]"); // x0 = result hash pointer + emitter.instruction("ldr x1, [sp, #32]"); // reload key pointer + emitter.instruction("ldr x2, [sp, #40]"); // reload key length + emitter.instruction("mov x4, #0"); // array values use no high word + emitter.instruction("mov x5, #5"); // value tag 5 = associative array + emitter.instruction("bl __rt_hash_set"); // store the merged sub-array (releases the previous value) + emitter.instruction("str x0, [sp, #16]"); // update the result pointer after possible reallocation + emitter.instruction("b __rt_array_replace_recursive_loop"); // continue with the next hash2 entry + emitter.label("__rt_array_replace_recursive_over"); + emitter.instruction("ldr x9, [sp, #64]"); // reload the hash2 value runtime tag + emitter.instruction("cmp x9, #1"); // is the value a string? + emitter.instruction("b.eq __rt_array_replace_recursive_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp x9, #4"); // is the value below the heap-backed tag range? + emitter.instruction("b.lt __rt_array_replace_recursive_insert"); // scalar values need no retain + emitter.instruction("cmp x9, #7"); // is the value above the heap-backed tag range? + emitter.instruction("b.gt __rt_array_replace_recursive_insert"); // non-heap tags need no retain + emitter.instruction("ldr x0, [sp, #48]"); // load the heap-backed value low word + emitter.instruction("bl __rt_incref"); // retain the heap-backed value for the result hash owner + emitter.instruction("b __rt_array_replace_recursive_insert"); // continue to the insertion + emitter.label("__rt_array_replace_recursive_persist"); + emitter.instruction("ldr x1, [sp, #48]"); // string pointer to persist + emitter.instruction("ldr x2, [sp, #56]"); // string length to persist + emitter.instruction("bl __rt_str_persist"); // copy the string into an independent heap block, x1 = new pointer + emitter.instruction("str x1, [sp, #48]"); // store the persisted string pointer + emitter.instruction("str x2, [sp, #56]"); // store the persisted string length + emitter.label("__rt_array_replace_recursive_insert"); + emitter.instruction("ldr x0, [sp, #16]"); // x0 = result hash pointer + emitter.instruction("ldr x1, [sp, #32]"); // reload key pointer + emitter.instruction("ldr x2, [sp, #40]"); // reload key length + emitter.instruction("ldr x3, [sp, #48]"); // reload value low word + emitter.instruction("ldr x4, [sp, #56]"); // reload value high word + emitter.instruction("ldr x5, [sp, #64]"); // reload value runtime tag + emitter.instruction("bl __rt_hash_set"); // overwrite or append the value into the result hash + emitter.instruction("str x0, [sp, #16]"); // update the result pointer after possible reallocation + emitter.instruction("b __rt_array_replace_recursive_loop"); // continue with the next hash2 entry + emitter.label("__rt_array_replace_recursive_done"); + emitter.instruction("ldr x0, [sp, #16]"); // x0 = result hash pointer + emitter.instruction("ldp x29, x30, [sp, #96]"); // restore frame pointer and return address + emitter.instruction("add sp, sp, #112"); // deallocate the stack frame + emitter.instruction("ret"); // return the result hash in x0 +} + +/// x86_64 Linux implementation of `__rt_array_replace_recursive`. +/// Input: rdi = hash1 pointer, rsi = hash2 pointer +/// Output: rax = new owned hash pointer +fn emit_array_replace_recursive_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: array_replace_recursive ---"); + emitter.label_global("__rt_array_replace_recursive"); + emitter.instruction("push rbp"); // preserve the caller frame pointer + emitter.instruction("mov rbp, rsp"); // establish a stable frame base + emitter.instruction("sub rsp, 96"); // reserve local spill slots for the recursive merge state + emitter.instruction("mov QWORD PTR [rbp - 8], rdi"); // save hash1 pointer + emitter.instruction("mov QWORD PTR [rbp - 16], rsi"); // save hash2 pointer + emitter.instruction("call __rt_hash_clone_shallow"); // clone hash1 into an owned result hash, rax = result + emitter.instruction("mov QWORD PTR [rbp - 24], rax"); // save the result hash pointer + emitter.instruction("mov QWORD PTR [rbp - 32], 0"); // iterator cursor = 0 (start from hash2 head) + emitter.label("__rt_array_replace_recursive_loop"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 16]"); // rdi = hash2 pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 32]"); // rsi = current iterator cursor + emitter.instruction("call __rt_hash_iter_next"); // next hash2 entry: rax=cursor,rdi=kptr,rdx=klen,rcx=vlo,r8=vhi,r9=vtag + emitter.instruction("cmp rax, -1"); // has iteration reached the end? + emitter.instruction("je __rt_array_replace_recursive_done"); // stop once every hash2 entry is merged + emitter.instruction("mov QWORD PTR [rbp - 32], rax"); // save the next iterator cursor + emitter.instruction("mov QWORD PTR [rbp - 40], rdi"); // save key pointer + emitter.instruction("mov QWORD PTR [rbp - 48], rdx"); // save key length + emitter.instruction("mov QWORD PTR [rbp - 56], rcx"); // save hash2 value low word + emitter.instruction("mov QWORD PTR [rbp - 64], r8"); // save hash2 value high word + emitter.instruction("mov QWORD PTR [rbp - 72], r9"); // save hash2 value runtime tag + emitter.instruction("cmp r9, 5"); // is the hash2 value an associative array? + emitter.instruction("jne __rt_array_replace_recursive_over"); // non-array values overwrite directly + emitter.instruction("mov rdi, QWORD PTR [rbp - 8]"); // rdi = hash1 pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 40]"); // rsi = key low word + emitter.instruction("mov rdx, QWORD PTR [rbp - 48]"); // rdx = key high word (-1 marks an integer key) + emitter.instruction("call __rt_hash_get"); // look up the key in hash1: rax=found,rdi=vlo,rsi=vhi,rcx=vtag + emitter.instruction("test rax, rax"); // was the key present in hash1? + emitter.instruction("je __rt_array_replace_recursive_over"); // absent in hash1 means append, not recurse + emitter.instruction("cmp rcx, 5"); // is the hash1 value also an associative array? + emitter.instruction("jne __rt_array_replace_recursive_over"); // only recurse when both values are arrays + emitter.instruction("mov QWORD PTR [rbp - 80], rdi"); // save the hash1 nested array pointer + emitter.instruction("mov rdi, QWORD PTR [rbp - 80]"); // rdi = hash1 nested array (recursion arg1) + emitter.instruction("mov rsi, QWORD PTR [rbp - 56]"); // rsi = hash2 nested array (recursion arg2) + emitter.instruction("call __rt_array_replace_recursive"); // recurse into the nested arrays, rax = merged sub-array + emitter.instruction("mov rcx, rax"); // merged sub-array becomes the new value low word + emitter.instruction("mov rdi, QWORD PTR [rbp - 24]"); // rdi = result hash pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 40]"); // reload key pointer + emitter.instruction("mov rdx, QWORD PTR [rbp - 48]"); // reload key length + emitter.instruction("xor r8, r8"); // array values use no high word + emitter.instruction("mov r9, 5"); // value tag 5 = associative array + emitter.instruction("call __rt_hash_set"); // store the merged sub-array (releases the previous value) + emitter.instruction("mov QWORD PTR [rbp - 24], rax"); // update the result pointer after possible reallocation + emitter.instruction("jmp __rt_array_replace_recursive_loop"); // continue with the next hash2 entry + emitter.label("__rt_array_replace_recursive_over"); + emitter.instruction("mov r10, QWORD PTR [rbp - 72]"); // reload the hash2 value runtime tag + emitter.instruction("cmp r10, 1"); // is the value a string? + emitter.instruction("je __rt_array_replace_recursive_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp r10, 4"); // is the value below the heap-backed tag range? + emitter.instruction("jl __rt_array_replace_recursive_insert"); // scalar values need no retain + emitter.instruction("cmp r10, 7"); // is the value above the heap-backed tag range? + emitter.instruction("jg __rt_array_replace_recursive_insert"); // non-heap tags need no retain + emitter.instruction("mov rdi, QWORD PTR [rbp - 56]"); // load the heap-backed value low word + emitter.instruction("call __rt_incref"); // retain the heap-backed value for the result hash owner + emitter.instruction("jmp __rt_array_replace_recursive_insert"); // continue to the insertion + emitter.label("__rt_array_replace_recursive_persist"); + emitter.instruction("mov rax, QWORD PTR [rbp - 56]"); // string pointer to persist + emitter.instruction("mov rdx, QWORD PTR [rbp - 64]"); // string length to persist + emitter.instruction("call __rt_str_persist"); // copy the string into an independent heap block, rax = new pointer + emitter.instruction("mov QWORD PTR [rbp - 56], rax"); // store the persisted string pointer + emitter.instruction("mov QWORD PTR [rbp - 64], rdx"); // store the persisted string length + emitter.label("__rt_array_replace_recursive_insert"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 24]"); // rdi = result hash pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 40]"); // reload key pointer + emitter.instruction("mov rdx, QWORD PTR [rbp - 48]"); // reload key length + emitter.instruction("mov rcx, QWORD PTR [rbp - 56]"); // reload value low word + emitter.instruction("mov r8, QWORD PTR [rbp - 64]"); // reload value high word + emitter.instruction("mov r9, QWORD PTR [rbp - 72]"); // reload value runtime tag + emitter.instruction("call __rt_hash_set"); // overwrite or append the value into the result hash + emitter.instruction("mov QWORD PTR [rbp - 24], rax"); // update the result pointer after possible reallocation + emitter.instruction("jmp __rt_array_replace_recursive_loop"); // continue with the next hash2 entry + emitter.label("__rt_array_replace_recursive_done"); + emitter.instruction("mov rax, QWORD PTR [rbp - 24]"); // rax = result hash pointer + emitter.instruction("add rsp, 96"); // release the local spill slots + emitter.instruction("pop rbp"); // restore the caller frame pointer + emitter.instruction("ret"); // return the result hash in rax +} + diff --git a/src/codegen/runtime/arrays/array_to_hash.rs b/src/codegen/runtime/arrays/array_to_hash.rs new file mode 100644 index 000000000..d7b0cf239 --- /dev/null +++ b/src/codegen/runtime/arrays/array_to_hash.rs @@ -0,0 +1,177 @@ +//! Purpose: +//! Emits the `__rt_array_to_hash` runtime helper that converts an indexed array to an owned hash. +//! Lets the hash-based array builtins accept indexed-array inputs (keys 0..n-1). +//! +//! Called from: +//! - `crate::codegen::runtime::emitters::emit_runtime()` via `crate::codegen::runtime::arrays`. +//! +//! Key details: +//! - String values are persisted (independent copies) and heap values retained, so the result owns its payloads. + +use crate::codegen::emit::Emitter; +use crate::codegen::platform::Arch; + +/// array_to_hash: build an owned hash {0: e0, 1: e1, ...} from an indexed array. +/// Input: x0 = indexed array pointer +/// Output: x0 = new owned hash table with integer keys 0..length-1 +/// +/// Reads the indexed value_type to extract each element: string elements (16-byte slots) +/// are persisted into independent heap copies; heap-backed elements are retained; scalar +/// elements are copied by value. Used to accept indexed inputs in the hash-based builtins. +pub fn emit_array_to_hash(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_array_to_hash_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: array_to_hash ---"); + emitter.label_global("__rt_array_to_hash"); + emitter.instruction("sub sp, sp, #80"); // allocate the conversion stack frame + emitter.instruction("stp x29, x30, [sp, #64]"); // save frame pointer and return address + emitter.instruction("add x29, sp, #64"); // set up the new frame pointer + emitter.instruction("str x0, [sp, #0]"); // save the indexed array pointer + emitter.instruction("ldr x9, [x0]"); // load the indexed array length + emitter.instruction("str x9, [sp, #24]"); // save the length + emitter.instruction("ldr x10, [x0, #-8]"); // load the uniform heap-kind header word + emitter.instruction("lsr x10, x10, #8"); // shift the packed value_type into the low bits + emitter.instruction("and x10, x10, #0x7f"); // isolate the indexed-array value_type (also the Mixed tag) + emitter.instruction("str x10, [sp, #32]"); // save the value_type / runtime tag + emitter.instruction("ldr x11, [x0, #16]"); // load the element size (stride) from the header + emitter.instruction("str x11, [sp, #40]"); // save the element stride + emitter.instruction("mov x1, x10"); // value_type for the new hash header + emitter.instruction("cmp x9, #8"); // is the length below the minimum hash capacity? + emitter.instruction("b.ge __rt_array_to_hash_cap_ok"); // use the length as the capacity hint + emitter.instruction("mov x9, #8"); // clamp the capacity hint to a small minimum + emitter.label("__rt_array_to_hash_cap_ok"); + emitter.instruction("mov x0, x9"); // capacity hint for the new hash + emitter.instruction("bl __rt_hash_new"); // allocate the result hash, x0 = result + emitter.instruction("str x0, [sp, #8]"); // save the result hash pointer + emitter.instruction("str xzr, [sp, #16]"); // index i = 0 + emitter.label("__rt_array_to_hash_loop"); + emitter.instruction("ldr x9, [sp, #24]"); // reload the length + emitter.instruction("ldr x10, [sp, #16]"); // reload the index + emitter.instruction("cmp x10, x9"); // has the index reached the length? + emitter.instruction("b.ge __rt_array_to_hash_done"); // all elements converted + emitter.instruction("ldr x11, [sp, #0]"); // reload the indexed array pointer + emitter.instruction("add x11, x11, #24"); // skip the 24-byte indexed-array header + emitter.instruction("ldr x12, [sp, #40]"); // reload the element stride + emitter.instruction("mul x13, x10, x12"); // byte offset of element[i] + emitter.instruction("add x11, x11, x13"); // x11 = address of element[i] + emitter.instruction("ldr x3, [x11]"); // load the element low word + emitter.instruction("str x3, [sp, #48]"); // save the element low word + emitter.instruction("ldr x9, [sp, #32]"); // reload the value_type + emitter.instruction("cmp x9, #1"); // is the element a string? + emitter.instruction("b.eq __rt_array_to_hash_string"); // strings need persistence + emitter.instruction("mov x9, #0"); // non-string elements have no high word + emitter.instruction("str x9, [sp, #56]"); // save a zero high word + emitter.instruction("ldr x9, [sp, #32]"); // reload the value_type + emitter.instruction("cmp x9, #4"); // is the element below the heap-backed tag range? + emitter.instruction("b.lt __rt_array_to_hash_set"); // scalar elements need no retain + emitter.instruction("cmp x9, #7"); // is the element above the heap-backed tag range? + emitter.instruction("b.gt __rt_array_to_hash_set"); // non-heap tags need no retain + emitter.instruction("ldr x0, [sp, #48]"); // load the heap-backed element pointer + emitter.instruction("bl __rt_incref"); // retain the heap-backed element for the result hash + emitter.instruction("b __rt_array_to_hash_set"); // continue to insertion + emitter.label("__rt_array_to_hash_string"); + emitter.instruction("ldr x2, [x11, #8]"); // load the string length from the 16-byte slot + emitter.instruction("ldr x1, [sp, #48]"); // load the string pointer + emitter.instruction("bl __rt_str_persist"); // copy the string into an independent heap block, x1 = new pointer + emitter.instruction("str x1, [sp, #48]"); // save the persisted string pointer + emitter.instruction("str x2, [sp, #56]"); // save the string length + emitter.label("__rt_array_to_hash_set"); + emitter.instruction("ldr x0, [sp, #8]"); // result hash pointer + emitter.instruction("ldr x1, [sp, #16]"); // integer key = index i + emitter.instruction("mov x2, #-1"); // key_hi = -1 marks an integer key + emitter.instruction("ldr x3, [sp, #48]"); // value low word + emitter.instruction("ldr x4, [sp, #56]"); // value high word + emitter.instruction("ldr x5, [sp, #32]"); // value runtime tag (= value_type) + emitter.instruction("bl __rt_hash_set"); // insert element[i] at integer key i + emitter.instruction("str x0, [sp, #8]"); // update the result pointer after possible reallocation + emitter.instruction("ldr x10, [sp, #16]"); // reload the index + emitter.instruction("add x10, x10, #1"); // advance to the next element + emitter.instruction("str x10, [sp, #16]"); // save the advanced index + emitter.instruction("b __rt_array_to_hash_loop"); // continue converting elements + emitter.label("__rt_array_to_hash_done"); + emitter.instruction("ldr x0, [sp, #8]"); // x0 = result hash pointer + emitter.instruction("ldp x29, x30, [sp, #64]"); // restore frame pointer and return address + emitter.instruction("add sp, sp, #80"); // deallocate the stack frame + emitter.instruction("ret"); // return the result hash in x0 +} + +/// x86_64 Linux implementation of `__rt_array_to_hash`. +/// Input: rdi = indexed array pointer +/// Output: rax = new owned hash with integer keys 0..length-1 +fn emit_array_to_hash_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: array_to_hash ---"); + emitter.label_global("__rt_array_to_hash"); + emitter.instruction("push rbp"); // preserve the caller frame pointer + emitter.instruction("mov rbp, rsp"); // establish a stable frame base + emitter.instruction("sub rsp, 64"); // reserve local slots for the conversion loop state + emitter.instruction("mov QWORD PTR [rbp - 8], rdi"); // save the indexed array pointer + emitter.instruction("mov rax, QWORD PTR [rdi]"); // load the indexed array length + emitter.instruction("mov QWORD PTR [rbp - 16], rax"); // save the length + emitter.instruction("mov r10, QWORD PTR [rdi - 8]"); // load the uniform heap-kind header word + emitter.instruction("shr r10, 8"); // shift the packed value_type into the low bits + emitter.instruction("and r10, 127"); // isolate the indexed-array value_type (also the Mixed tag) + emitter.instruction("mov QWORD PTR [rbp - 24], r10"); // save the value_type / runtime tag + emitter.instruction("mov r11, QWORD PTR [rdi + 16]"); // load the element size (stride) from the header + emitter.instruction("mov QWORD PTR [rbp - 32], r11"); // save the element stride + emitter.instruction("mov rsi, r10"); // value_type for the new hash header + emitter.instruction("mov rdi, rax"); // capacity hint = length + emitter.instruction("cmp rdi, 8"); // is the length below the minimum hash capacity? + emitter.instruction("jge __rt_array_to_hash_cap_ok"); // use the length as the capacity hint + emitter.instruction("mov rdi, 8"); // clamp the capacity hint to a small minimum + emitter.label("__rt_array_to_hash_cap_ok"); + emitter.instruction("call __rt_hash_new"); // allocate the result hash, rax = result + emitter.instruction("mov QWORD PTR [rbp - 40], rax"); // save the result hash pointer + emitter.instruction("mov QWORD PTR [rbp - 48], 0"); // index i = 0 + emitter.label("__rt_array_to_hash_loop"); + emitter.instruction("mov rax, QWORD PTR [rbp - 48]"); // reload the index + emitter.instruction("cmp rax, QWORD PTR [rbp - 16]"); // has the index reached the length? + emitter.instruction("jge __rt_array_to_hash_done"); // all elements converted + emitter.instruction("mov r10, QWORD PTR [rbp - 8]"); // reload the indexed array pointer + emitter.instruction("add r10, 24"); // skip the 24-byte indexed-array header + emitter.instruction("mov r11, QWORD PTR [rbp - 32]"); // reload the element stride + emitter.instruction("imul r11, rax"); // byte offset of element[i] + emitter.instruction("add r10, r11"); // r10 = address of element[i] + emitter.instruction("mov rcx, QWORD PTR [r10]"); // load the element low word + emitter.instruction("mov QWORD PTR [rbp - 56], rcx"); // save the element low word + emitter.instruction("mov r9, QWORD PTR [rbp - 24]"); // reload the value_type + emitter.instruction("cmp r9, 1"); // is the element a string? + emitter.instruction("je __rt_array_to_hash_string"); // strings need persistence + emitter.instruction("mov QWORD PTR [rbp - 64], 0"); // non-string elements have no high word + emitter.instruction("cmp r9, 4"); // is the element below the heap-backed tag range? + emitter.instruction("jl __rt_array_to_hash_set"); // scalar elements need no retain + emitter.instruction("cmp r9, 7"); // is the element above the heap-backed tag range? + emitter.instruction("jg __rt_array_to_hash_set"); // non-heap tags need no retain + emitter.instruction("mov rdi, QWORD PTR [rbp - 56]"); // load the heap-backed element pointer + emitter.instruction("call __rt_incref"); // retain the heap-backed element for the result hash + emitter.instruction("jmp __rt_array_to_hash_set"); // continue to insertion + emitter.label("__rt_array_to_hash_string"); + emitter.instruction("mov rdx, QWORD PTR [r10 + 8]"); // load the string length from the 16-byte slot + emitter.instruction("mov rax, QWORD PTR [rbp - 56]"); // load the string pointer + emitter.instruction("call __rt_str_persist"); // copy the string into an independent heap block, rax = new pointer + emitter.instruction("mov QWORD PTR [rbp - 56], rax"); // save the persisted string pointer + emitter.instruction("mov QWORD PTR [rbp - 64], rdx"); // save the string length + emitter.label("__rt_array_to_hash_set"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 40]"); // result hash pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 48]"); // integer key = index i + emitter.instruction("mov rdx, -1"); // key_hi = -1 marks an integer key + emitter.instruction("mov rcx, QWORD PTR [rbp - 56]"); // value low word + emitter.instruction("mov r8, QWORD PTR [rbp - 64]"); // value high word + emitter.instruction("mov r9, QWORD PTR [rbp - 24]"); // value runtime tag (= value_type) + emitter.instruction("call __rt_hash_set"); // insert element[i] at integer key i + emitter.instruction("mov QWORD PTR [rbp - 40], rax"); // update the result pointer after possible reallocation + emitter.instruction("mov rax, QWORD PTR [rbp - 48]"); // reload the index + emitter.instruction("add rax, 1"); // advance to the next element + emitter.instruction("mov QWORD PTR [rbp - 48], rax"); // save the advanced index + emitter.instruction("jmp __rt_array_to_hash_loop"); // continue converting elements + emitter.label("__rt_array_to_hash_done"); + emitter.instruction("mov rax, QWORD PTR [rbp - 40]"); // rax = result hash pointer + emitter.instruction("add rsp, 64"); // release the local slots + emitter.instruction("pop rbp"); // restore the caller frame pointer + emitter.instruction("ret"); // return the result hash in rax +} + diff --git a/src/codegen/runtime/arrays/array_udiff_uintersect.rs b/src/codegen/runtime/arrays/array_udiff_uintersect.rs new file mode 100644 index 000000000..d3d2341cb --- /dev/null +++ b/src/codegen/runtime/arrays/array_udiff_uintersect.rs @@ -0,0 +1,173 @@ +//! Purpose: +//! Emits the `__rt_array_udiff_uintersect` runtime helper for array_udiff / array_uintersect. +//! Compares elements of two indexed arrays with a user comparator (equal when cmp returns 0). +//! +//! Called from: +//! - `crate::codegen::runtime::emitters::emit_runtime()` via `crate::codegen::runtime::arrays`. +//! +//! Key details: +//! - O(n*m) nested scan; scalar (8-byte) elements; result preallocated to arr1 capacity so pushes never reallocate. + +use crate::codegen::emit::Emitter; +use crate::codegen::platform::Arch; + +/// array_udiff_uintersect: filter arr1 by comparator-based membership in arr2. +/// Input: x0 = comparator address, x1 = arr1, x2 = arr2, x3 = optional environment, x4 = mode +/// Output: x0 = new indexed array of kept elements (repacked at sequential indices) +/// +/// For each arr1 element, scans arr2 calling `cmp(a, b [, env])`; a zero result means equal. +/// mode 0 (udiff) keeps elements absent from arr2; mode 1 (uintersect) keeps elements present. +/// x19 (comparator), x20 (env) and x21 (mode) are callee-saved across the comparator calls. +pub fn emit_array_udiff_uintersect(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_array_udiff_uintersect_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: array_udiff_uintersect ---"); + emitter.label_global("__rt_array_udiff_uintersect"); + emitter.instruction("sub sp, sp, #96"); // allocate the udiff/uintersect stack frame + emitter.instruction("stp x29, x30, [sp, #80]"); // save frame pointer and return address + emitter.instruction("add x29, sp, #80"); // set up the new frame pointer + emitter.instruction("stp x19, x20, [sp, #64]"); // save callee-saved comparator address and environment + emitter.instruction("str x21, [sp, #56]"); // save callee-saved mode selector + emitter.instruction("mov x19, x0"); // x19 = comparator address (callee-saved) + emitter.instruction("str x1, [sp, #0]"); // save arr1 pointer + emitter.instruction("str x2, [sp, #8]"); // save arr2 pointer + emitter.instruction("mov x20, x3"); // x20 = optional environment (callee-saved) + emitter.instruction("mov x21, x4"); // x21 = mode (0 = udiff, 1 = uintersect) + emitter.instruction("ldr x0, [x1, #8]"); // x0 = arr1 capacity for the result allocation + emitter.instruction("mov x1, #8"); // result element size = 8 bytes (scalar) + emitter.instruction("bl __rt_array_new"); // allocate the result array, x0 = result + emitter.instruction("str x0, [sp, #16]"); // save the result array pointer + emitter.instruction("str xzr, [sp, #24]"); // outer index i = 0 + emitter.label("__rt_array_udiff_uintersect_outer"); + emitter.instruction("ldr x0, [sp, #0]"); // reload arr1 pointer + emitter.instruction("ldr x3, [x0]"); // x3 = arr1 length + emitter.instruction("ldr x4, [sp, #24]"); // x4 = outer index i + emitter.instruction("cmp x4, x3"); // has i reached arr1 length? + emitter.instruction("b.ge __rt_array_udiff_uintersect_done"); // finish once every arr1 element is processed + emitter.instruction("add x5, x0, #24"); // x5 = arr1 data base + emitter.instruction("ldr x6, [x5, x4, lsl #3]"); // x6 = arr1[i] + emitter.instruction("str x6, [sp, #40]"); // save the current element across comparator calls + emitter.instruction("str xzr, [sp, #32]"); // inner index j = 0 + emitter.label("__rt_array_udiff_uintersect_inner"); + emitter.instruction("ldr x9, [sp, #32]"); // reload inner index j + emitter.instruction("ldr x1, [sp, #8]"); // reload arr2 pointer + emitter.instruction("ldr x7, [x1]"); // x7 = arr2 length + emitter.instruction("cmp x9, x7"); // has j reached arr2 length? + emitter.instruction("b.ge __rt_array_udiff_uintersect_absent"); // no arr2 element compared equal + emitter.instruction("add x8, x1, #24"); // x8 = arr2 data base + emitter.instruction("ldr x10, [x8, x9, lsl #3]"); // x10 = arr2[j] + emitter.instruction("ldr x0, [sp, #40]"); // comparator argument a = current arr1 element + emitter.instruction("mov x1, x10"); // comparator argument b = arr2[j] + emitter.instruction("cbz x20, __rt_array_udiff_uintersect_cmp"); // no environment keeps the two-argument comparator ABI + emitter.instruction("mov x2, x20"); // pass the environment as the third comparator argument + emitter.label("__rt_array_udiff_uintersect_cmp"); + emitter.instruction("blr x19"); // call cmp(a, b [, env]); zero result means equal + emitter.instruction("cbz x0, __rt_array_udiff_uintersect_present"); // a zero comparator result means the element is in arr2 + emitter.instruction("ldr x9, [sp, #32]"); // reload inner index j + emitter.instruction("add x9, x9, #1"); // advance to the next arr2 element + emitter.instruction("str x9, [sp, #32]"); // save the advanced inner index + emitter.instruction("b __rt_array_udiff_uintersect_inner"); // continue scanning arr2 + emitter.label("__rt_array_udiff_uintersect_present"); + emitter.instruction("cbz x21, __rt_array_udiff_uintersect_advance"); // udiff drops elements present in arr2 + emitter.instruction("b __rt_array_udiff_uintersect_push"); // uintersect keeps elements present in arr2 + emitter.label("__rt_array_udiff_uintersect_absent"); + emitter.instruction("cbz x21, __rt_array_udiff_uintersect_push"); // udiff keeps elements absent from arr2 + emitter.instruction("b __rt_array_udiff_uintersect_advance"); // uintersect drops elements absent from arr2 + emitter.label("__rt_array_udiff_uintersect_push"); + emitter.instruction("ldr x0, [sp, #16]"); // x0 = result array pointer + emitter.instruction("ldr x1, [sp, #40]"); // x1 = element value to keep + emitter.instruction("bl __rt_array_push_int"); // append the kept element to the preallocated result + emitter.label("__rt_array_udiff_uintersect_advance"); + emitter.instruction("ldr x4, [sp, #24]"); // reload the outer index i + emitter.instruction("add x4, x4, #1"); // advance to the next arr1 element + emitter.instruction("str x4, [sp, #24]"); // save the advanced outer index + emitter.instruction("b __rt_array_udiff_uintersect_outer"); // continue the outer loop + emitter.label("__rt_array_udiff_uintersect_done"); + emitter.instruction("ldr x0, [sp, #16]"); // x0 = result array pointer + emitter.instruction("ldr x21, [sp, #56]"); // restore the callee-saved mode selector + emitter.instruction("ldp x19, x20, [sp, #64]"); // restore callee-saved comparator address and environment + emitter.instruction("ldp x29, x30, [sp, #80]"); // restore frame pointer and return address + emitter.instruction("add sp, sp, #96"); // deallocate the stack frame + emitter.instruction("ret"); // return the result array in x0 +} + +/// x86_64 Linux implementation of `__rt_array_udiff_uintersect`. +/// Input: rdi = comparator, rsi = arr1, rdx = arr2, rcx = optional environment, r8 = mode +/// Output: rax = new indexed array of kept elements +fn emit_array_udiff_uintersect_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: array_udiff_uintersect ---"); + emitter.label_global("__rt_array_udiff_uintersect"); + emitter.instruction("push rbp"); // preserve the caller frame pointer + emitter.instruction("mov rbp, rsp"); // establish a stable frame base + emitter.instruction("push r12"); // preserve the comparator address across the loops + emitter.instruction("push r13"); // preserve the environment across the comparator calls + emitter.instruction("push r14"); // preserve the mode selector across the comparator calls + emitter.instruction("sub rsp, 48"); // reserve local slots for arr1/arr2/result/i/j/element + emitter.instruction("mov r12, rdi"); // r12 = comparator address (callee-saved) + emitter.instruction("mov r13, rcx"); // r13 = optional environment (callee-saved) + emitter.instruction("mov r14, r8"); // r14 = mode (0 = udiff, 1 = uintersect) + emitter.instruction("mov QWORD PTR [rbp - 32], rsi"); // save arr1 pointer + emitter.instruction("mov QWORD PTR [rbp - 40], rdx"); // save arr2 pointer + emitter.instruction("mov rdi, QWORD PTR [rsi + 8]"); // rdi = arr1 capacity for the result allocation + emitter.instruction("mov rsi, 8"); // result element size = 8 bytes (scalar) + emitter.instruction("call __rt_array_new"); // allocate the result array, rax = result + emitter.instruction("mov QWORD PTR [rbp - 48], rax"); // save the result array pointer + emitter.instruction("mov QWORD PTR [rbp - 56], 0"); // outer index i = 0 + emitter.label("__rt_array_udiff_uintersect_outer"); + emitter.instruction("mov r10, QWORD PTR [rbp - 32]"); // reload arr1 pointer + emitter.instruction("mov rax, QWORD PTR [rbp - 56]"); // reload the outer index i + emitter.instruction("cmp rax, QWORD PTR [r10]"); // has i reached arr1 length? + emitter.instruction("jge __rt_array_udiff_uintersect_done"); // finish once every arr1 element is processed + emitter.instruction("mov r11, QWORD PTR [r10 + rax * 8 + 24]"); // r11 = arr1[i] + emitter.instruction("mov QWORD PTR [rbp - 64], r11"); // save the current element across comparator calls + emitter.instruction("mov QWORD PTR [rbp - 72], 0"); // inner index j = 0 + emitter.label("__rt_array_udiff_uintersect_inner"); + emitter.instruction("mov rax, QWORD PTR [rbp - 72]"); // reload inner index j + emitter.instruction("mov r10, QWORD PTR [rbp - 40]"); // reload arr2 pointer + emitter.instruction("cmp rax, QWORD PTR [r10]"); // has j reached arr2 length? + emitter.instruction("jge __rt_array_udiff_uintersect_absent"); // no arr2 element compared equal + emitter.instruction("mov rsi, QWORD PTR [r10 + rax * 8 + 24]"); // comparator argument b = arr2[j] + emitter.instruction("mov rdi, QWORD PTR [rbp - 64]"); // comparator argument a = current arr1 element + emitter.instruction("test r13, r13"); // is an environment present? + emitter.instruction("jz __rt_array_udiff_uintersect_cmp"); // no environment keeps the two-argument comparator ABI + emitter.instruction("mov rdx, r13"); // pass the environment as the third comparator argument + emitter.label("__rt_array_udiff_uintersect_cmp"); + emitter.instruction("call r12"); // call cmp(a, b [, env]); zero result means equal + emitter.instruction("test rax, rax"); // did the comparator report equality? + emitter.instruction("jz __rt_array_udiff_uintersect_present"); // a zero comparator result means the element is in arr2 + emitter.instruction("mov rax, QWORD PTR [rbp - 72]"); // reload inner index j + emitter.instruction("add rax, 1"); // advance to the next arr2 element + emitter.instruction("mov QWORD PTR [rbp - 72], rax"); // save the advanced inner index + emitter.instruction("jmp __rt_array_udiff_uintersect_inner"); // continue scanning arr2 + emitter.label("__rt_array_udiff_uintersect_present"); + emitter.instruction("test r14, r14"); // is the mode udiff (0)? + emitter.instruction("jz __rt_array_udiff_uintersect_advance"); // udiff drops elements present in arr2 + emitter.instruction("jmp __rt_array_udiff_uintersect_push"); // uintersect keeps elements present in arr2 + emitter.label("__rt_array_udiff_uintersect_absent"); + emitter.instruction("test r14, r14"); // is the mode udiff (0)? + emitter.instruction("jz __rt_array_udiff_uintersect_push"); // udiff keeps elements absent from arr2 + emitter.instruction("jmp __rt_array_udiff_uintersect_advance"); // uintersect drops elements absent from arr2 + emitter.label("__rt_array_udiff_uintersect_push"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 48]"); // rdi = result array pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 64]"); // rsi = element value to keep + emitter.instruction("call __rt_array_push_int"); // append the kept element to the preallocated result + emitter.label("__rt_array_udiff_uintersect_advance"); + emitter.instruction("mov rax, QWORD PTR [rbp - 56]"); // reload the outer index i + emitter.instruction("add rax, 1"); // advance to the next arr1 element + emitter.instruction("mov QWORD PTR [rbp - 56], rax"); // save the advanced outer index + emitter.instruction("jmp __rt_array_udiff_uintersect_outer"); // continue the outer loop + emitter.label("__rt_array_udiff_uintersect_done"); + emitter.instruction("mov rax, QWORD PTR [rbp - 48]"); // rax = result array pointer + emitter.instruction("add rsp, 48"); // release the local slots + emitter.instruction("pop r14"); // restore the mode register + emitter.instruction("pop r13"); // restore the environment register + emitter.instruction("pop r12"); // restore the comparator register + emitter.instruction("pop rbp"); // restore the caller frame pointer + emitter.instruction("ret"); // return the result array in rax +} + diff --git a/src/codegen/runtime/arrays/array_walk_recursive.rs b/src/codegen/runtime/arrays/array_walk_recursive.rs new file mode 100644 index 000000000..b371a64c2 --- /dev/null +++ b/src/codegen/runtime/arrays/array_walk_recursive.rs @@ -0,0 +1,207 @@ +//! Purpose: +//! Emits the `__rt_array_walk_recursive` runtime helper for array_walk_recursive. +//! Recursively walks nested indexed/associative arrays, invoking the callback on each non-array leaf value. +//! +//! Called from: +//! - `crate::codegen::runtime::emitters::emit_runtime()` via `crate::codegen::runtime::arrays`. +//! +//! Key details: +//! - Mirrors `__rt_array_walk` (scalar leaf passed in one word, env optional) but descends into array-valued elements. + +use crate::codegen::emit::Emitter; +use crate::codegen::platform::Arch; + +/// array_walk_recursive: invoke the callback on every non-array leaf of a nested array. +/// Input: x0 = callback address, x1 = array pointer, x2 = optional callback environment +/// Output: none (void); the callback return value is discarded +/// +/// Indexed arrays whose element type is itself an array recurse element-by-element; hashes +/// recurse into array-tagged values and call the callback on scalar leaves. The callback is +/// invoked as `(leaf, env)` when env is non-null, else `(leaf)`. x19 (callback) and x20 (env) +/// are callee-saved, so they survive both the callback calls and the recursive descent. +pub fn emit_array_walk_recursive(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_array_walk_recursive_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: array_walk_recursive ---"); + emitter.label_global("__rt_array_walk_recursive"); + emitter.instruction("sub sp, sp, #64"); // allocate the recursive-walk stack frame + emitter.instruction("stp x29, x30, [sp, #48]"); // save frame pointer and return address + emitter.instruction("add x29, sp, #48"); // set up the new frame pointer + emitter.instruction("stp x19, x20, [sp, #32]"); // save callee-saved callback address and environment + emitter.instruction("mov x19, x0"); // x19 = callback address (callee-saved across recursion and calls) + emitter.instruction("mov x20, x2"); // x20 = optional callback environment (callee-saved) + emitter.instruction("str x1, [sp, #0]"); // save the current array pointer + emitter.instruction("ldr x9, [x1, #-8]"); // load the uniform heap-kind header word + emitter.instruction("and x10, x9, #0xff"); // isolate the low-byte heap kind + emitter.instruction("cmp x10, #3"); // is the container an associative hash? + emitter.instruction("b.eq __rt_array_walk_recursive_hash"); // hashes iterate in insertion order + emitter.instruction("lsr x11, x9, #8"); // shift the packed indexed-array value_type into the low bits + emitter.instruction("and x11, x11, #0x7f"); // isolate the indexed-array value_type tag + emitter.instruction("ldr x12, [x1]"); // load the indexed-array length + emitter.instruction("str x12, [sp, #8]"); // save the length for the loop bound + emitter.instruction("str xzr, [sp, #16]"); // index i = 0 + emitter.instruction("cmp x11, #4"); // are the elements indexed sub-arrays? + emitter.instruction("b.eq __rt_array_walk_recursive_idx_rec"); // recurse into indexed sub-array elements + emitter.instruction("cmp x11, #5"); // are the elements associative sub-arrays? + emitter.instruction("b.eq __rt_array_walk_recursive_idx_rec"); // recurse into associative sub-array elements + emitter.label("__rt_array_walk_recursive_idx_leaf"); + emitter.instruction("ldr x12, [sp, #8]"); // reload the length + emitter.instruction("ldr x13, [sp, #16]"); // reload the index + emitter.instruction("cmp x13, x12"); // has the index reached the length? + emitter.instruction("b.ge __rt_array_walk_recursive_done"); // finish once every element is visited + emitter.instruction("ldr x1, [sp, #0]"); // reload the array pointer + emitter.instruction("add x1, x1, #24"); // skip the 24-byte indexed-array header + emitter.instruction("ldr x0, [x1, x13, lsl #3]"); // load the scalar leaf at element[i] + emitter.instruction("cbz x20, __rt_array_walk_recursive_idx_call"); // no environment keeps the one-argument callback ABI + emitter.instruction("mov x1, x20"); // pass the callback environment as the second argument + emitter.label("__rt_array_walk_recursive_idx_call"); + emitter.instruction("blr x19"); // invoke callback(leaf [, env]); return value discarded + emitter.instruction("ldr x13, [sp, #16]"); // reload the index after the callback call + emitter.instruction("add x13, x13, #1"); // advance to the next element + emitter.instruction("str x13, [sp, #16]"); // save the advanced index + emitter.instruction("b __rt_array_walk_recursive_idx_leaf"); // continue the scalar-leaf loop + emitter.label("__rt_array_walk_recursive_idx_rec"); + emitter.instruction("ldr x12, [sp, #8]"); // reload the length + emitter.instruction("ldr x13, [sp, #16]"); // reload the index + emitter.instruction("cmp x13, x12"); // has the index reached the length? + emitter.instruction("b.ge __rt_array_walk_recursive_done"); // finish once every sub-array is visited + emitter.instruction("ldr x1, [sp, #0]"); // reload the array pointer + emitter.instruction("add x1, x1, #24"); // skip the 24-byte indexed-array header + emitter.instruction("ldr x1, [x1, x13, lsl #3]"); // load the sub-array pointer at element[i] + emitter.instruction("mov x0, x19"); // pass the callback address to the recursive call + emitter.instruction("mov x2, x20"); // pass the callback environment to the recursive call + emitter.instruction("bl __rt_array_walk_recursive"); // recurse into the sub-array + emitter.instruction("ldr x13, [sp, #16]"); // reload the index after the recursive call + emitter.instruction("add x13, x13, #1"); // advance to the next sub-array + emitter.instruction("str x13, [sp, #16]"); // save the advanced index + emitter.instruction("b __rt_array_walk_recursive_idx_rec"); // continue the recursive descent loop + emitter.label("__rt_array_walk_recursive_hash"); + emitter.instruction("str xzr, [sp, #16]"); // iterator cursor = 0 + emitter.label("__rt_array_walk_recursive_hash_loop"); + emitter.instruction("ldr x0, [sp, #0]"); // reload the hash pointer + emitter.instruction("ldr x1, [sp, #16]"); // reload the iterator cursor + emitter.instruction("bl __rt_hash_iter_next"); // next entry: x0=cursor,x1=kptr,x2=klen,x3=vlo,x4=vhi,x5=vtag + emitter.instruction("cmn x0, #1"); // has iteration reached the end (cursor == -1)? + emitter.instruction("b.eq __rt_array_walk_recursive_done"); // finish once every entry is visited + emitter.instruction("str x0, [sp, #16]"); // save the next iterator cursor + emitter.instruction("cmp x5, #4"); // is the value an indexed sub-array? + emitter.instruction("b.eq __rt_array_walk_recursive_hash_rec"); // recurse into indexed sub-array values + emitter.instruction("cmp x5, #5"); // is the value an associative sub-array? + emitter.instruction("b.eq __rt_array_walk_recursive_hash_rec"); // recurse into associative sub-array values + emitter.instruction("mov x0, x3"); // scalar leaf value goes in the first callback argument + emitter.instruction("cbz x20, __rt_array_walk_recursive_hash_call"); // no environment keeps the one-argument callback ABI + emitter.instruction("mov x1, x20"); // pass the callback environment as the second argument + emitter.label("__rt_array_walk_recursive_hash_call"); + emitter.instruction("blr x19"); // invoke callback(leaf [, env]); return value discarded + emitter.instruction("b __rt_array_walk_recursive_hash_loop"); // continue iterating the hash entries + emitter.label("__rt_array_walk_recursive_hash_rec"); + emitter.instruction("mov x0, x19"); // pass the callback address to the recursive call + emitter.instruction("mov x1, x3"); // pass the sub-array value pointer to the recursive call + emitter.instruction("mov x2, x20"); // pass the callback environment to the recursive call + emitter.instruction("bl __rt_array_walk_recursive"); // recurse into the sub-array value + emitter.instruction("b __rt_array_walk_recursive_hash_loop"); // continue iterating the hash entries + emitter.label("__rt_array_walk_recursive_done"); + emitter.instruction("ldp x19, x20, [sp, #32]"); // restore callee-saved callback address and environment + emitter.instruction("ldp x29, x30, [sp, #48]"); // restore frame pointer and return address + emitter.instruction("add sp, sp, #64"); // deallocate the stack frame + emitter.instruction("ret"); // return (void) +} + +/// x86_64 Linux implementation of `__rt_array_walk_recursive`. +/// Input: rdi = callback address, rsi = array pointer, rdx = optional callback environment +/// Output: none (void). Callee-saved r12 (callback) and r14 (environment) survive recursion. +fn emit_array_walk_recursive_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: array_walk_recursive ---"); + emitter.label_global("__rt_array_walk_recursive"); + emitter.instruction("push rbp"); // preserve the caller frame pointer + emitter.instruction("mov rbp, rsp"); // establish a stable frame base + emitter.instruction("push r12"); // preserve the callback address across recursion and calls + emitter.instruction("push r13"); // preserve a scratch callee-saved register + emitter.instruction("push r14"); // preserve the optional callback environment across calls + emitter.instruction("sub rsp, 24"); // reserve local slots for the array pointer, length, and index/cursor + emitter.instruction("mov r12, rdi"); // r12 = callback address (callee-saved) + emitter.instruction("mov r14, rdx"); // r14 = optional callback environment (callee-saved) + emitter.instruction("mov QWORD PTR [rbp - 32], rsi"); // save the current array pointer + emitter.instruction("mov r10, QWORD PTR [rsi - 8]"); // load the uniform heap-kind header word + emitter.instruction("mov r11, r10"); // copy the header word before masking the heap kind + emitter.instruction("and r11, 255"); // isolate the low-byte heap kind + emitter.instruction("cmp r11, 3"); // is the container an associative hash? + emitter.instruction("je __rt_array_walk_recursive_hash"); // hashes iterate in insertion order + emitter.instruction("shr r10, 8"); // shift the packed indexed-array value_type into the low bits + emitter.instruction("and r10, 127"); // isolate the indexed-array value_type tag + emitter.instruction("mov rax, QWORD PTR [rsi]"); // load the indexed-array length + emitter.instruction("mov QWORD PTR [rbp - 40], rax"); // save the length for the loop bound + emitter.instruction("mov QWORD PTR [rbp - 48], 0"); // index i = 0 + emitter.instruction("cmp r10, 4"); // are the elements indexed sub-arrays? + emitter.instruction("je __rt_array_walk_recursive_idx_rec"); // recurse into indexed sub-array elements + emitter.instruction("cmp r10, 5"); // are the elements associative sub-arrays? + emitter.instruction("je __rt_array_walk_recursive_idx_rec"); // recurse into associative sub-array elements + emitter.label("__rt_array_walk_recursive_idx_leaf"); + emitter.instruction("mov rax, QWORD PTR [rbp - 48]"); // reload the index + emitter.instruction("cmp rax, QWORD PTR [rbp - 40]"); // has the index reached the length? + emitter.instruction("jge __rt_array_walk_recursive_done"); // finish once every element is visited + emitter.instruction("mov r10, QWORD PTR [rbp - 32]"); // reload the array pointer + emitter.instruction("mov rdi, QWORD PTR [r10 + rax * 8 + 24]"); // load the scalar leaf at element[i] + emitter.instruction("test r14, r14"); // is a callback environment present? + emitter.instruction("jz __rt_array_walk_recursive_idx_call"); // no environment keeps the one-argument callback ABI + emitter.instruction("mov rsi, r14"); // pass the callback environment as the second argument + emitter.label("__rt_array_walk_recursive_idx_call"); + emitter.instruction("call r12"); // invoke callback(leaf [, env]); return value discarded + emitter.instruction("mov rax, QWORD PTR [rbp - 48]"); // reload the index after the callback call + emitter.instruction("add rax, 1"); // advance to the next element + emitter.instruction("mov QWORD PTR [rbp - 48], rax"); // save the advanced index + emitter.instruction("jmp __rt_array_walk_recursive_idx_leaf"); // continue the scalar-leaf loop + emitter.label("__rt_array_walk_recursive_idx_rec"); + emitter.instruction("mov rax, QWORD PTR [rbp - 48]"); // reload the index + emitter.instruction("cmp rax, QWORD PTR [rbp - 40]"); // has the index reached the length? + emitter.instruction("jge __rt_array_walk_recursive_done"); // finish once every sub-array is visited + emitter.instruction("mov r10, QWORD PTR [rbp - 32]"); // reload the array pointer + emitter.instruction("mov rsi, QWORD PTR [r10 + rax * 8 + 24]"); // load the sub-array pointer at element[i] + emitter.instruction("mov rdi, r12"); // pass the callback address to the recursive call + emitter.instruction("mov rdx, r14"); // pass the callback environment to the recursive call + emitter.instruction("call __rt_array_walk_recursive"); // recurse into the sub-array + emitter.instruction("mov rax, QWORD PTR [rbp - 48]"); // reload the index after the recursive call + emitter.instruction("add rax, 1"); // advance to the next sub-array + emitter.instruction("mov QWORD PTR [rbp - 48], rax"); // save the advanced index + emitter.instruction("jmp __rt_array_walk_recursive_idx_rec"); // continue the recursive descent loop + emitter.label("__rt_array_walk_recursive_hash"); + emitter.instruction("mov QWORD PTR [rbp - 48], 0"); // iterator cursor = 0 + emitter.label("__rt_array_walk_recursive_hash_loop"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 32]"); // reload the hash pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 48]"); // reload the iterator cursor + emitter.instruction("call __rt_hash_iter_next"); // next entry: rax=cursor,rdi=kptr,rdx=klen,rcx=vlo,r8=vhi,r9=vtag + emitter.instruction("cmp rax, -1"); // has iteration reached the end? + emitter.instruction("je __rt_array_walk_recursive_done"); // finish once every entry is visited + emitter.instruction("mov QWORD PTR [rbp - 48], rax"); // save the next iterator cursor + emitter.instruction("mov QWORD PTR [rbp - 24], rcx"); // save the value low word across the dispatch + emitter.instruction("cmp r9, 4"); // is the value an indexed sub-array? + emitter.instruction("je __rt_array_walk_recursive_hash_rec"); // recurse into indexed sub-array values + emitter.instruction("cmp r9, 5"); // is the value an associative sub-array? + emitter.instruction("je __rt_array_walk_recursive_hash_rec"); // recurse into associative sub-array values + emitter.instruction("mov rdi, QWORD PTR [rbp - 24]"); // scalar leaf value goes in the first callback argument + emitter.instruction("test r14, r14"); // is a callback environment present? + emitter.instruction("jz __rt_array_walk_recursive_hash_call"); // no environment keeps the one-argument callback ABI + emitter.instruction("mov rsi, r14"); // pass the callback environment as the second argument + emitter.label("__rt_array_walk_recursive_hash_call"); + emitter.instruction("call r12"); // invoke callback(leaf [, env]); return value discarded + emitter.instruction("jmp __rt_array_walk_recursive_hash_loop"); // continue iterating the hash entries + emitter.label("__rt_array_walk_recursive_hash_rec"); + emitter.instruction("mov rdi, r12"); // pass the callback address to the recursive call + emitter.instruction("mov rsi, QWORD PTR [rbp - 24]"); // pass the sub-array value pointer to the recursive call + emitter.instruction("mov rdx, r14"); // pass the callback environment to the recursive call + emitter.instruction("call __rt_array_walk_recursive"); // recurse into the sub-array value + emitter.instruction("jmp __rt_array_walk_recursive_hash_loop"); // continue iterating the hash entries + emitter.label("__rt_array_walk_recursive_done"); + emitter.instruction("add rsp, 24"); // release the local bookkeeping slots + emitter.instruction("pop r14"); // restore the caller environment register + emitter.instruction("pop r13"); // restore the caller scratch register + emitter.instruction("pop r12"); // restore the caller callback register + emitter.instruction("pop rbp"); // restore the caller frame pointer + emitter.instruction("ret"); // return (void) +} + diff --git a/src/codegen/runtime/arrays/assoc_diff_intersect.rs b/src/codegen/runtime/arrays/assoc_diff_intersect.rs new file mode 100644 index 000000000..c81b968aa --- /dev/null +++ b/src/codegen/runtime/arrays/assoc_diff_intersect.rs @@ -0,0 +1,249 @@ +//! Purpose: +//! Emits the `__rt_assoc_diff_intersect` runtime helper for array_diff_assoc / array_intersect_assoc. +//! Keeps entries of hash1 whose (key, value) pair is absent from (diff) or present in (intersect) hash2. +//! +//! Called from: +//! - `crate::codegen::runtime::emitters::emit_runtime()` via `crate::codegen::runtime::arrays`. +//! +//! Key details: +//! - Values compare by PHP string cast: `(string)a === (string)b`. Temporary Mixed boxes are released to avoid leaks. + +use crate::codegen::emit::Emitter; +use crate::codegen::platform::Arch; + +/// assoc_diff_intersect: filter hash1 entries by (key, value) membership in hash2. +/// Input: x0 = hash1, x1 = hash2, x2 = mode (0 = diff, 1 = intersect) +/// Output: x0 = new owned hash with the kept entries (keys/values retained for the result) +/// +/// For each hash1 entry, looks up the key in hash2 and compares the values with PHP +/// string-cast equality (`__rt_mixed_from_value` -> `__rt_mixed_cast_string` -> `__rt_str_eq`), +/// releasing both temporary boxes afterward. diff keeps entries whose pair is NOT in hash2; +/// intersect keeps entries whose pair IS in hash2. +pub fn emit_assoc_diff_intersect(emitter: &mut Emitter) { + if emitter.target.arch == Arch::X86_64 { + emit_assoc_diff_intersect_linux_x86_64(emitter); + return; + } + + emitter.blank(); + emitter.comment("--- runtime: assoc_diff_intersect ---"); + emitter.label_global("__rt_assoc_diff_intersect"); + emitter.instruction("sub sp, sp, #160"); // allocate the diff/intersect stack frame + emitter.instruction("stp x29, x30, [sp, #144]"); // save frame pointer and return address + emitter.instruction("add x29, sp, #144"); // set up the new frame pointer + emitter.instruction("str x0, [sp, #0]"); // save hash1 pointer + emitter.instruction("str x1, [sp, #8]"); // save hash2 pointer + emitter.instruction("str x2, [sp, #24]"); // save mode (0 = diff, 1 = intersect) + emitter.instruction("ldr x9, [sp, #0]"); // reload hash1 pointer + emitter.instruction("ldr x0, [x9, #8]"); // x0 = hash1 capacity for the result hash + emitter.instruction("ldr x1, [x9, #16]"); // x1 = hash1 value_type summary + emitter.instruction("bl __rt_hash_new"); // create the result hash table, x0 = result + emitter.instruction("str x0, [sp, #16]"); // save the result hash pointer + emitter.instruction("str xzr, [sp, #32]"); // iterator cursor = 0 (start from hash1 head) + emitter.label("__rt_assoc_diff_intersect_loop"); + emitter.instruction("ldr x0, [sp, #0]"); // x0 = hash1 pointer + emitter.instruction("ldr x1, [sp, #32]"); // x1 = current iterator cursor + emitter.instruction("bl __rt_hash_iter_next"); // next hash1 entry: x0=cursor,x1=kptr,x2=klen,x3=vlo,x4=vhi,x5=vtag + emitter.instruction("cmn x0, #1"); // has iteration reached the end (cursor == -1)? + emitter.instruction("b.eq __rt_assoc_diff_intersect_done"); // stop once every hash1 entry has been visited + emitter.instruction("str x0, [sp, #32]"); // save the next iterator cursor + emitter.instruction("str x1, [sp, #40]"); // save key pointer + emitter.instruction("str x2, [sp, #48]"); // save key length + emitter.instruction("str x3, [sp, #56]"); // save hash1 value low word + emitter.instruction("str x4, [sp, #64]"); // save hash1 value high word + emitter.instruction("str x5, [sp, #72]"); // save hash1 value runtime tag + emitter.instruction("ldr x0, [sp, #8]"); // x0 = hash2 pointer + emitter.instruction("ldr x1, [sp, #40]"); // x1 = key low word (key pointer or integer key) + emitter.instruction("ldr x2, [sp, #48]"); // x2 = key high word (-1 marks an integer key) + emitter.instruction("bl __rt_hash_get"); // look up the key in hash2: x0=found,x1=vlo,x2=vhi,x3=vtag + emitter.instruction("str x1, [sp, #80]"); // save hash2 value low word + emitter.instruction("str x2, [sp, #88]"); // save hash2 value high word + emitter.instruction("str x3, [sp, #96]"); // save hash2 value runtime tag + emitter.instruction("cbz x0, __rt_assoc_diff_intersect_nomatch"); // absent key cannot form a matching pair + emitter.comment("-- compare the two values by PHP string cast --"); + emitter.instruction("ldr x0, [sp, #72]"); // hash1 value runtime tag + emitter.instruction("ldr x1, [sp, #56]"); // hash1 value low word + emitter.instruction("ldr x2, [sp, #64]"); // hash1 value high word + emitter.instruction("bl __rt_mixed_from_value"); // box the hash1 value, x0 = box1 + emitter.instruction("str x0, [sp, #104]"); // save box1 for later release + emitter.instruction("bl __rt_mixed_cast_string"); // cast box1 to string: x1=ptr, x2=len + emitter.instruction("str x1, [sp, #120]"); // save the hash1 value string pointer + emitter.instruction("str x2, [sp, #128]"); // save the hash1 value string length + emitter.instruction("ldr x0, [sp, #96]"); // hash2 value runtime tag + emitter.instruction("ldr x1, [sp, #80]"); // hash2 value low word + emitter.instruction("ldr x2, [sp, #88]"); // hash2 value high word + emitter.instruction("bl __rt_mixed_from_value"); // box the hash2 value, x0 = box2 + emitter.instruction("str x0, [sp, #112]"); // save box2 for later release + emitter.instruction("bl __rt_mixed_cast_string"); // cast box2 to string: x1=ptr, x2=len + emitter.instruction("mov x3, x1"); // move the hash2 string pointer into the str_eq right operand + emitter.instruction("mov x4, x2"); // move the hash2 string length into the str_eq right operand + emitter.instruction("ldr x1, [sp, #120]"); // reload the hash1 string pointer as the str_eq left operand + emitter.instruction("ldr x2, [sp, #128]"); // reload the hash1 string length as the str_eq left operand + emitter.instruction("bl __rt_str_eq"); // compare the two cast strings, x0 = equal + emitter.instruction("str x0, [sp, #136]"); // save the value-equality result across the box releases + emitter.instruction("ldr x0, [sp, #104]"); // reload box1 for release + emitter.instruction("bl __rt_decref_mixed"); // release the temporary hash1 value box + emitter.instruction("ldr x0, [sp, #112]"); // reload box2 for release + emitter.instruction("bl __rt_decref_mixed"); // release the temporary hash2 value box + emitter.instruction("ldr x0, [sp, #136]"); // x0 = pair matches (found and string-equal values) + emitter.instruction("b __rt_assoc_diff_intersect_decide"); // decide whether to keep this entry + emitter.label("__rt_assoc_diff_intersect_nomatch"); + emitter.instruction("mov x0, #0"); // the pair does not match (key absent from hash2) + emitter.label("__rt_assoc_diff_intersect_decide"); + emitter.instruction("ldr x9, [sp, #24]"); // reload the mode selector + emitter.instruction("cbz x9, __rt_assoc_diff_intersect_diff"); // mode 0 selects difference semantics + emitter.instruction("cbz x0, __rt_assoc_diff_intersect_skip"); // intersect drops entries whose pair is not in hash2 + emitter.instruction("b __rt_assoc_diff_intersect_keep"); // intersect keeps matching pairs + emitter.label("__rt_assoc_diff_intersect_diff"); + emitter.instruction("cbnz x0, __rt_assoc_diff_intersect_skip"); // diff drops entries whose pair is in hash2 + emitter.label("__rt_assoc_diff_intersect_keep"); + emitter.instruction("ldr x9, [sp, #72]"); // reload the hash1 value runtime tag + emitter.instruction("cmp x9, #1"); // is the kept value a string? + emitter.instruction("b.eq __rt_assoc_diff_intersect_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp x9, #4"); // is the value below the heap-backed tag range? + emitter.instruction("b.lt __rt_assoc_diff_intersect_insert"); // scalar values need no retain + emitter.instruction("cmp x9, #7"); // is the value above the heap-backed tag range? + emitter.instruction("b.gt __rt_assoc_diff_intersect_insert"); // non-heap tags need no retain + emitter.instruction("ldr x0, [sp, #56]"); // load the kept heap-backed value low word + emitter.instruction("bl __rt_incref"); // retain the kept heap-backed value for the result owner + emitter.instruction("b __rt_assoc_diff_intersect_insert"); // continue to the insertion + emitter.label("__rt_assoc_diff_intersect_persist"); + emitter.instruction("ldr x1, [sp, #56]"); // string pointer to persist + emitter.instruction("ldr x2, [sp, #64]"); // string length to persist + emitter.instruction("bl __rt_str_persist"); // copy the string into an independent heap block, x1 = new pointer + emitter.instruction("str x1, [sp, #56]"); // store the persisted string pointer + emitter.instruction("str x2, [sp, #64]"); // store the persisted string length + emitter.label("__rt_assoc_diff_intersect_insert"); + emitter.instruction("ldr x0, [sp, #16]"); // x0 = result hash pointer + emitter.instruction("ldr x1, [sp, #40]"); // reload key pointer + emitter.instruction("ldr x2, [sp, #48]"); // reload key length + emitter.instruction("ldr x3, [sp, #56]"); // reload value low word + emitter.instruction("ldr x4, [sp, #64]"); // reload value high word + emitter.instruction("ldr x5, [sp, #72]"); // reload value runtime tag + emitter.instruction("bl __rt_hash_set"); // insert the kept entry into the result hash + emitter.instruction("str x0, [sp, #16]"); // update the result pointer after possible reallocation + emitter.label("__rt_assoc_diff_intersect_skip"); + emitter.instruction("b __rt_assoc_diff_intersect_loop"); // continue with the next hash1 entry + emitter.label("__rt_assoc_diff_intersect_done"); + emitter.instruction("ldr x0, [sp, #16]"); // x0 = result hash pointer + emitter.instruction("ldp x29, x30, [sp, #144]"); // restore frame pointer and return address + emitter.instruction("add sp, sp, #160"); // deallocate the stack frame + emitter.instruction("ret"); // return the result hash in x0 +} + +/// x86_64 Linux implementation of `__rt_assoc_diff_intersect`. +/// Input: rdi = hash1, rsi = hash2, rdx = mode (0 = diff, 1 = intersect) +/// Output: rax = new owned hash with the kept entries +fn emit_assoc_diff_intersect_linux_x86_64(emitter: &mut Emitter) { + emitter.blank(); + emitter.comment("--- runtime: assoc_diff_intersect ---"); + emitter.label_global("__rt_assoc_diff_intersect"); + emitter.instruction("push rbp"); // preserve the caller frame pointer + emitter.instruction("mov rbp, rsp"); // establish a stable frame base + emitter.instruction("sub rsp, 160"); // reserve local spill slots for the filter loop state + emitter.instruction("mov QWORD PTR [rbp - 8], rdi"); // save hash1 pointer + emitter.instruction("mov QWORD PTR [rbp - 16], rsi"); // save hash2 pointer + emitter.instruction("mov QWORD PTR [rbp - 32], rdx"); // save mode (0 = diff, 1 = intersect) + emitter.instruction("mov r10, QWORD PTR [rbp - 8]"); // reload hash1 pointer + emitter.instruction("mov rdi, QWORD PTR [r10 + 8]"); // rdi = hash1 capacity for the result hash + emitter.instruction("mov rsi, QWORD PTR [r10 + 16]"); // rsi = hash1 value_type summary + emitter.instruction("call __rt_hash_new"); // create the result hash table, rax = result + emitter.instruction("mov QWORD PTR [rbp - 24], rax"); // save the result hash pointer + emitter.instruction("mov QWORD PTR [rbp - 40], 0"); // iterator cursor = 0 (start from hash1 head) + emitter.label("__rt_assoc_diff_intersect_loop"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 8]"); // rdi = hash1 pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 40]"); // rsi = current iterator cursor + emitter.instruction("call __rt_hash_iter_next"); // next hash1 entry: rax=cursor,rdi=kptr,rdx=klen,rcx=vlo,r8=vhi,r9=vtag + emitter.instruction("cmp rax, -1"); // has iteration reached the end? + emitter.instruction("je __rt_assoc_diff_intersect_done"); // stop once every hash1 entry has been visited + emitter.instruction("mov QWORD PTR [rbp - 40], rax"); // save the next iterator cursor + emitter.instruction("mov QWORD PTR [rbp - 48], rdi"); // save key pointer + emitter.instruction("mov QWORD PTR [rbp - 56], rdx"); // save key length + emitter.instruction("mov QWORD PTR [rbp - 64], rcx"); // save hash1 value low word + emitter.instruction("mov QWORD PTR [rbp - 72], r8"); // save hash1 value high word + emitter.instruction("mov QWORD PTR [rbp - 80], r9"); // save hash1 value runtime tag + emitter.instruction("mov rdi, QWORD PTR [rbp - 16]"); // rdi = hash2 pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 48]"); // rsi = key low word + emitter.instruction("mov rdx, QWORD PTR [rbp - 56]"); // rdx = key high word (-1 marks an integer key) + emitter.instruction("call __rt_hash_get"); // look up the key in hash2: rax=found,rdi=vlo,rsi=vhi,rcx=vtag + emitter.instruction("mov QWORD PTR [rbp - 88], rdi"); // save hash2 value low word + emitter.instruction("mov QWORD PTR [rbp - 96], rsi"); // save hash2 value high word + emitter.instruction("mov QWORD PTR [rbp - 104], rcx"); // save hash2 value runtime tag + emitter.instruction("test rax, rax"); // was the key present in hash2? + emitter.instruction("je __rt_assoc_diff_intersect_nomatch"); // absent key cannot form a matching pair + emitter.comment("-- compare the two values by PHP string cast --"); + emitter.instruction("mov rax, QWORD PTR [rbp - 80]"); // hash1 value runtime tag + emitter.instruction("mov rdi, QWORD PTR [rbp - 64]"); // hash1 value low word + emitter.instruction("mov rsi, QWORD PTR [rbp - 72]"); // hash1 value high word + emitter.instruction("call __rt_mixed_from_value"); // box the hash1 value, rax = box1 + emitter.instruction("mov QWORD PTR [rbp - 112], rax"); // save box1 for later release + emitter.instruction("mov rdi, rax"); // pass box1 to the string cast helper + emitter.instruction("call __rt_mixed_cast_string"); // cast box1 to string: rax=ptr, rdx=len + emitter.instruction("mov QWORD PTR [rbp - 128], rax"); // save the hash1 value string pointer + emitter.instruction("mov QWORD PTR [rbp - 136], rdx"); // save the hash1 value string length + emitter.instruction("mov rax, QWORD PTR [rbp - 104]"); // hash2 value runtime tag + emitter.instruction("mov rdi, QWORD PTR [rbp - 88]"); // hash2 value low word + emitter.instruction("mov rsi, QWORD PTR [rbp - 96]"); // hash2 value high word + emitter.instruction("call __rt_mixed_from_value"); // box the hash2 value, rax = box2 + emitter.instruction("mov QWORD PTR [rbp - 120], rax"); // save box2 for later release + emitter.instruction("mov rdi, rax"); // pass box2 to the string cast helper + emitter.instruction("call __rt_mixed_cast_string"); // cast box2 to string: rax=ptr, rdx=len + emitter.instruction("mov rcx, rdx"); // move the hash2 string length into the str_eq right length + emitter.instruction("mov rdx, rax"); // move the hash2 string pointer into the str_eq right pointer + emitter.instruction("mov rdi, QWORD PTR [rbp - 128]"); // reload the hash1 string pointer as the str_eq left pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 136]"); // reload the hash1 string length as the str_eq left length + emitter.instruction("call __rt_str_eq"); // compare the two cast strings, rax = equal + emitter.instruction("mov QWORD PTR [rbp - 144], rax"); // save the value-equality result across the box releases + emitter.instruction("mov rdi, QWORD PTR [rbp - 112]"); // reload box1 for release + emitter.instruction("call __rt_decref_mixed"); // release the temporary hash1 value box + emitter.instruction("mov rdi, QWORD PTR [rbp - 120]"); // reload box2 for release + emitter.instruction("call __rt_decref_mixed"); // release the temporary hash2 value box + emitter.instruction("mov rax, QWORD PTR [rbp - 144]"); // rax = pair matches (found and string-equal values) + emitter.instruction("jmp __rt_assoc_diff_intersect_decide"); // decide whether to keep this entry + emitter.label("__rt_assoc_diff_intersect_nomatch"); + emitter.instruction("xor eax, eax"); // the pair does not match (key absent from hash2) + emitter.label("__rt_assoc_diff_intersect_decide"); + emitter.instruction("mov r10, QWORD PTR [rbp - 32]"); // reload the mode selector + emitter.instruction("test r10, r10"); // is the mode difference (0)? + emitter.instruction("je __rt_assoc_diff_intersect_diff"); // mode 0 selects difference semantics + emitter.instruction("test rax, rax"); // did the pair match for intersect? + emitter.instruction("je __rt_assoc_diff_intersect_skip"); // intersect drops entries whose pair is not in hash2 + emitter.instruction("jmp __rt_assoc_diff_intersect_keep"); // intersect keeps matching pairs + emitter.label("__rt_assoc_diff_intersect_diff"); + emitter.instruction("test rax, rax"); // did the pair match for diff? + emitter.instruction("jne __rt_assoc_diff_intersect_skip"); // diff drops entries whose pair is in hash2 + emitter.label("__rt_assoc_diff_intersect_keep"); + emitter.instruction("mov r10, QWORD PTR [rbp - 80]"); // reload the hash1 value runtime tag + emitter.instruction("cmp r10, 1"); // is the kept value a string? + emitter.instruction("je __rt_assoc_diff_intersect_persist"); // strings are persisted as an independent copy + emitter.instruction("cmp r10, 4"); // is the value below the heap-backed tag range? + emitter.instruction("jl __rt_assoc_diff_intersect_insert"); // scalar values need no retain + emitter.instruction("cmp r10, 7"); // is the value above the heap-backed tag range? + emitter.instruction("jg __rt_assoc_diff_intersect_insert"); // non-heap tags need no retain + emitter.instruction("mov rdi, QWORD PTR [rbp - 64]"); // load the kept heap-backed value low word + emitter.instruction("call __rt_incref"); // retain the kept heap-backed value for the result owner + emitter.instruction("jmp __rt_assoc_diff_intersect_insert"); // continue to the insertion + emitter.label("__rt_assoc_diff_intersect_persist"); + emitter.instruction("mov rax, QWORD PTR [rbp - 64]"); // string pointer to persist + emitter.instruction("mov rdx, QWORD PTR [rbp - 72]"); // string length to persist + emitter.instruction("call __rt_str_persist"); // copy the string into an independent heap block, rax = new pointer + emitter.instruction("mov QWORD PTR [rbp - 64], rax"); // store the persisted string pointer + emitter.instruction("mov QWORD PTR [rbp - 72], rdx"); // store the persisted string length + emitter.label("__rt_assoc_diff_intersect_insert"); + emitter.instruction("mov rdi, QWORD PTR [rbp - 24]"); // rdi = result hash pointer + emitter.instruction("mov rsi, QWORD PTR [rbp - 48]"); // reload key pointer + emitter.instruction("mov rdx, QWORD PTR [rbp - 56]"); // reload key length + emitter.instruction("mov rcx, QWORD PTR [rbp - 64]"); // reload value low word + emitter.instruction("mov r8, QWORD PTR [rbp - 72]"); // reload value high word + emitter.instruction("mov r9, QWORD PTR [rbp - 80]"); // reload value runtime tag + emitter.instruction("call __rt_hash_set"); // insert the kept entry into the result hash + emitter.instruction("mov QWORD PTR [rbp - 24], rax"); // update the result pointer after possible reallocation + emitter.label("__rt_assoc_diff_intersect_skip"); + emitter.instruction("jmp __rt_assoc_diff_intersect_loop"); // continue with the next hash1 entry + emitter.label("__rt_assoc_diff_intersect_done"); + emitter.instruction("mov rax, QWORD PTR [rbp - 24]"); // rax = result hash pointer + emitter.instruction("add rsp, 160"); // release the local spill slots + emitter.instruction("pop rbp"); // restore the caller frame pointer + emitter.instruction("ret"); // return the result hash in rax +} + diff --git a/src/codegen/runtime/arrays/mod.rs b/src/codegen/runtime/arrays/mod.rs index 25b8480ea..b52bb2699 100644 --- a/src/codegen/runtime/arrays/mod.rs +++ b/src/codegen/runtime/arrays/mod.rs @@ -20,12 +20,17 @@ mod array_clone_shallow; mod array_diff; mod array_diff_refcounted; mod array_diff_key; +mod array_edge_key; mod array_ensure_unique; +mod array_is_list; +mod assoc_diff_intersect; mod array_fill; mod array_fill_keys; mod array_fill_keys_refcounted; mod array_fill_refcounted; mod array_filter; +mod array_find_any_all; +mod array_udiff_uintersect; mod array_filter_refcounted; mod array_flip; mod array_flip_string; @@ -40,6 +45,8 @@ mod array_map; mod array_map_mixed; mod array_map_str; mod array_merge; +mod array_merge_recursive; +mod array_multisort; mod array_merge_into; mod array_merge_into_refcounted; mod array_merge_refcounted; @@ -54,6 +61,8 @@ mod array_rand; mod random_u32; mod random_uniform; mod array_reduce; +mod array_replace; +mod array_replace_recursive; mod array_reverse; mod array_reverse_refcounted; mod array_search; @@ -63,12 +72,14 @@ mod array_slice_refcounted; mod array_splice; mod array_splice_refcounted; mod array_sum; +mod array_to_hash; mod array_to_mixed; mod array_union; mod array_unique; mod array_unique_refcounted; mod array_unshift; mod array_walk; +mod array_walk_recursive; mod asort; mod decref_any; mod decref_array; @@ -165,6 +176,12 @@ pub use array_fill_refcounted::emit_array_fill_refcounted; /// Emit refcounted array fill helper. pub use array_filter::emit_array_filter; /// Emit array filter helper. +pub use array_find_any_all::emit_array_find_any_all; +/// Emit array find/any/all predicate helper. +pub use array_udiff_uintersect::emit_array_udiff_uintersect; +/// Emit array udiff/uintersect comparator helper. +pub use array_multisort::emit_array_multisort; +/// Emit array multisort (parallel in-place sort) helper. pub use array_filter_refcounted::emit_array_filter_refcounted; /// Emit refcounted array filter helper. pub use array_flip::emit_array_flip; @@ -238,8 +255,24 @@ pub use array_splice_refcounted::emit_array_splice_refcounted; /// Emit refcounted array splice helper. pub use array_sum::emit_array_sum; /// Emit array sum helper. +pub use array_edge_key::emit_array_edge_key; +/// Emit array first/last key helper for array_key_first / array_key_last. +pub use array_is_list::emit_array_is_list; +/// Emit array is-list predicate helper. +pub use array_replace::emit_array_replace; +/// Emit array replace (right-wins key merge) helper. +pub use array_replace_recursive::emit_array_replace_recursive; +/// Emit recursive array replace helper. +pub use array_merge_recursive::emit_amr_box_value; +/// Emit value-to-list wrapper helper for array_merge_recursive. +pub use array_merge_recursive::emit_array_merge_recursive; +/// Emit recursive array merge helper. +pub use assoc_diff_intersect::emit_assoc_diff_intersect; +/// Emit associative diff/intersect (key+value) helper. pub use array_to_mixed::emit_array_to_mixed; /// Emit array-to-Mixed conversion helper. +pub use array_to_hash::emit_array_to_hash; +/// Emit indexed-array-to-hash conversion helper (lets hash-based builtins accept indexed inputs). pub use array_union::emit_array_union; /// Emit array union helper. pub use array_unique::emit_array_unique; @@ -250,6 +283,8 @@ pub use array_unshift::emit_array_unshift; /// Emit array unshift (prepend) helper. pub use array_walk::emit_array_walk; /// Emit array walk helper. +pub use array_walk_recursive::emit_array_walk_recursive; +/// Emit recursive array walk helper. pub use asort::emit_asort; /// Emit associative sort helper. pub use decref_any::emit_decref_any; diff --git a/src/codegen/runtime/emitters.rs b/src/codegen/runtime/emitters.rs index 67a47fef9..9c99d7040 100644 --- a/src/codegen/runtime/emitters.rs +++ b/src/codegen/runtime/emitters.rs @@ -201,11 +201,18 @@ pub(crate) fn emit_runtime(emitter: &mut Emitter, features: RuntimeFeatures) { arrays::emit_array_reverse(emitter); arrays::emit_array_reverse_refcounted(emitter); arrays::emit_array_sum(emitter); + arrays::emit_array_is_list(emitter); + arrays::emit_array_edge_key(emitter); arrays::emit_array_product(emitter); arrays::emit_array_shift(emitter); arrays::emit_array_unshift(emitter); arrays::emit_array_merge(emitter); arrays::emit_array_merge_refcounted(emitter); + arrays::emit_array_replace(emitter); + arrays::emit_array_replace_recursive(emitter); + arrays::emit_amr_box_value(emitter); + arrays::emit_array_merge_recursive(emitter); + arrays::emit_array_to_hash(emitter); arrays::emit_array_slice(emitter); arrays::emit_array_slice_refcounted(emitter); arrays::emit_range(emitter); @@ -237,6 +244,7 @@ pub(crate) fn emit_runtime(emitter: &mut Emitter, features: RuntimeFeatures) { arrays::emit_array_splice_refcounted(emitter); arrays::emit_array_diff_key(emitter); arrays::emit_array_intersect_key(emitter); + arrays::emit_assoc_diff_intersect(emitter); arrays::emit_asort(emitter); arrays::emit_ksort(emitter); arrays::emit_natsort(emitter); @@ -245,9 +253,13 @@ pub(crate) fn emit_runtime(emitter: &mut Emitter, features: RuntimeFeatures) { arrays::emit_array_map_str(emitter); arrays::emit_array_map_str_owned(emitter); arrays::emit_array_filter(emitter); + arrays::emit_array_find_any_all(emitter); + arrays::emit_array_udiff_uintersect(emitter); + arrays::emit_array_multisort(emitter); arrays::emit_array_filter_refcounted(emitter); arrays::emit_array_reduce(emitter); arrays::emit_array_walk(emitter); + arrays::emit_array_walk_recursive(emitter); arrays::emit_usort(emitter); arrays::emit_array_to_mixed(emitter); arrays::emit_array_merge_into(emitter); diff --git a/src/codegen/runtime/x86_minimal.rs b/src/codegen/runtime/x86_minimal.rs index ec0f6babe..577c46739 100644 --- a/src/codegen/runtime/x86_minimal.rs +++ b/src/codegen/runtime/x86_minimal.rs @@ -142,6 +142,11 @@ pub(super) fn emit_runtime_linux_x86_64_minimal( arrays::emit_array_intersect_refcounted(emitter); arrays::emit_array_merge(emitter); arrays::emit_array_merge_refcounted(emitter); + arrays::emit_array_replace(emitter); + arrays::emit_array_replace_recursive(emitter); + arrays::emit_amr_box_value(emitter); + arrays::emit_array_merge_recursive(emitter); + arrays::emit_array_to_hash(emitter); arrays::emit_array_pad(emitter); arrays::emit_array_pad_refcounted(emitter); arrays::emit_array_product(emitter); @@ -154,6 +159,8 @@ pub(super) fn emit_runtime_linux_x86_64_minimal( arrays::emit_array_splice(emitter); arrays::emit_array_splice_refcounted(emitter); arrays::emit_array_sum(emitter); + arrays::emit_array_is_list(emitter); + arrays::emit_array_edge_key(emitter); arrays::emit_array_chunk(emitter); arrays::emit_array_chunk_refcounted(emitter); arrays::emit_array_combine(emitter); @@ -168,9 +175,13 @@ pub(super) fn emit_runtime_linux_x86_64_minimal( arrays::emit_array_map_str(emitter); arrays::emit_array_map_str_owned(emitter); arrays::emit_array_filter(emitter); + arrays::emit_array_find_any_all(emitter); + arrays::emit_array_udiff_uintersect(emitter); + arrays::emit_array_multisort(emitter); arrays::emit_array_filter_refcounted(emitter); arrays::emit_array_reduce(emitter); arrays::emit_array_walk(emitter); + arrays::emit_array_walk_recursive(emitter); arrays::emit_usort(emitter); arrays::emit_hash_fnv1a(emitter); arrays::emit_hash_key_hash(emitter); @@ -199,6 +210,7 @@ pub(super) fn emit_runtime_linux_x86_64_minimal( arrays::emit_mixed_is_empty(emitter); arrays::emit_array_diff_key(emitter); arrays::emit_array_intersect_key(emitter); + arrays::emit_assoc_diff_intersect(emitter); arrays::emit_decref_hash(emitter); arrays::emit_hash_free_deep(emitter); arrays::emit_decref_any(emitter); @@ -412,6 +424,16 @@ mod tests { assert!(asm.contains("__rt_array_diff:\n")); assert!(asm.contains("__rt_array_intersect:\n")); assert!(asm.contains("__rt_array_merge:\n")); + assert!(asm.contains("__rt_array_replace:\n")); + assert!(asm.contains("__rt_assoc_diff_intersect:\n")); + assert!(asm.contains("__rt_array_replace_recursive:\n")); + assert!(asm.contains("__rt_array_walk_recursive:\n")); + assert!(asm.contains("__rt_array_merge_recursive:\n")); + assert!(asm.contains("__rt_amr_box_value:\n")); + assert!(asm.contains("__rt_array_to_hash:\n")); + assert!(asm.contains("__rt_array_find_any_all:\n")); + assert!(asm.contains("__rt_array_udiff_uintersect:\n")); + assert!(asm.contains("__rt_array_multisort:\n")); assert!(asm.contains("__rt_array_pad:\n")); assert!(asm.contains("__rt_array_product:\n")); assert!(asm.contains("__rt_array_rand:\n")); @@ -420,6 +442,8 @@ mod tests { assert!(asm.contains("__rt_array_slice:\n")); assert!(asm.contains("__rt_array_splice:\n")); assert!(asm.contains("__rt_array_sum:\n")); + assert!(asm.contains("__rt_array_is_list:\n")); + assert!(asm.contains("__rt_array_edge_key:\n")); assert!(asm.contains("__rt_array_chunk:\n")); assert!(asm.contains("__rt_array_combine:\n")); assert!(asm.contains("__rt_array_flip:\n")); diff --git a/src/optimize/effects/builtins.rs b/src/optimize/effects/builtins.rs index 3a4a4ca0d..560ea78a1 100644 --- a/src/optimize/effects/builtins.rs +++ b/src/optimize/effects/builtins.rs @@ -145,6 +145,14 @@ pub(super) fn is_pure_non_throwing_builtin(name: &str) -> bool { | "array_intersect" | "array_diff_key" | "array_intersect_key" + | "array_diff_assoc" + | "array_intersect_assoc" + | "array_merge_recursive" + | "array_replace" + | "array_replace_recursive" + | "array_is_list" + | "array_key_first" + | "array_key_last" | "range" ) // Note: json_encode / json_decode / json_validate / json_last_error / diff --git a/src/types/checker/builtins/arrays.rs b/src/types/checker/builtins/arrays.rs index 9b564e49b..b8f9116c1 100644 --- a/src/types/checker/builtins/arrays.rs +++ b/src/types/checker/builtins/arrays.rs @@ -195,6 +195,49 @@ pub(super) fn check_builtin( _ => Err(CompileError::new(span, "array_flip() argument must be array")), } } + "array_is_list" => { + if args.len() != 1 { + return Err(CompileError::new(span, "array_is_list() takes exactly 1 argument")); + } + let ty = checker.infer_type(&args[0], env)?; + if !matches!(ty, PhpType::Array(_) | PhpType::AssocArray { .. } | PhpType::Mixed) { + return Err(CompileError::new(span, "array_is_list() argument must be array")); + } + Ok(Some(PhpType::Bool)) + } + "array_key_first" | "array_key_last" => { + if args.len() != 1 { + return Err(CompileError::new( + span, + &format!("{}() takes exactly 1 argument", name), + )); + } + let ty = checker.infer_type(&args[0], env)?; + if !matches!(ty, PhpType::Array(_) | PhpType::AssocArray { .. } | PhpType::Mixed) { + return Err(CompileError::new( + span, + &format!("{}() argument must be array", name), + )); + } + Ok(Some(PhpType::Mixed)) + } + "array_multisort" => { + if args.len() != 2 { + return Err(CompileError::new( + span, + "array_multisort() takes exactly 2 arguments", + )); + } + let ty1 = checker.infer_type(&args[0], env)?; + let ty2 = checker.infer_type(&args[1], env)?; + if !matches!(ty1, PhpType::Array(_)) || !matches!(ty2, PhpType::Array(_)) { + return Err(CompileError::new( + span, + "array_multisort() arguments must be indexed arrays", + )); + } + Ok(Some(PhpType::Bool)) + } "array_shift" => { if args.len() != 1 { return Err(CompileError::new(span, "array_shift() takes exactly 1 argument")); @@ -296,6 +339,77 @@ pub(super) fn check_builtin( } Ok(Some(ty1)) } + "array_replace" | "array_replace_recursive" => { + if args.len() != 2 { + return Err(CompileError::new( + span, + &format!("{}() takes exactly 2 arguments", name), + )); + } + let ty1 = checker.infer_type(&args[0], env)?; + let ty2 = checker.infer_type(&args[1], env)?; + let accepted = |t: &PhpType| { + matches!(t, PhpType::AssocArray { .. }) || t.is_scalar_indexed_array() + }; + if !accepted(&ty1) || !accepted(&ty2) { + return Err(CompileError::new( + span, + &format!( + "{}() arguments must be associative arrays or indexed arrays of scalars", + name + ), + )); + } + Ok(Some(PhpType::two_input_hash_result(&ty1, &ty2))) + } + "array_diff_assoc" | "array_intersect_assoc" => { + if args.len() != 2 { + return Err(CompileError::new( + span, + &format!("{}() takes exactly 2 arguments", name), + )); + } + let ty1 = checker.infer_type(&args[0], env)?; + let ty2 = checker.infer_type(&args[1], env)?; + let accepted = |t: &PhpType| { + matches!(t, PhpType::AssocArray { .. }) || t.is_scalar_indexed_array() + }; + if !accepted(&ty1) || !accepted(&ty2) { + return Err(CompileError::new( + span, + &format!( + "{}() arguments must be associative arrays or indexed arrays of scalars", + name + ), + )); + } + Ok(Some(PhpType::two_input_hash_result(&ty1, &ty2))) + } + "array_merge_recursive" => { + if args.len() != 2 { + return Err(CompileError::new( + span, + "array_merge_recursive() takes exactly 2 arguments", + )); + } + let ty1 = checker.infer_type(&args[0], env)?; + let ty2 = checker.infer_type(&args[1], env)?; + let accepted = |t: &PhpType| { + matches!(t, PhpType::AssocArray { .. }) || t.is_scalar_indexed_array() + }; + if !accepted(&ty1) || !accepted(&ty2) { + return Err(CompileError::new( + span, + "array_merge_recursive() arguments must be associative arrays or indexed arrays of scalars", + )); + } + // Scalar collisions combine into lists, so the result value type is always Mixed; the + // key widens to Mixed when the two inputs disagree. + Ok(Some(PhpType::AssocArray { + key: Box::new(PhpType::widen(ty1.hash_key_type(), ty2.hash_key_type())), + value: Box::new(PhpType::Mixed), + })) + } "array_unshift" => { if args.len() != 2 { return Err(CompileError::new( diff --git a/src/types/checker/builtins/callables.rs b/src/types/checker/builtins/callables.rs index 721d862de..ab40571af 100644 --- a/src/types/checker/builtins/callables.rs +++ b/src/types/checker/builtins/callables.rs @@ -834,6 +834,68 @@ pub(super) fn check_builtin( )), } } + "array_find" | "array_any" | "array_all" => { + if args.len() != 2 { + return Err(CompileError::new( + span, + &format!("{}() takes exactly 2 arguments", name), + )); + } + for arg in args { + checker.infer_type(arg, env)?; + } + let arr_ty = checker.infer_type(&args[0], env)?; + if !matches!(arr_ty, PhpType::Array(_)) { + return Err(CompileError::new( + span, + &format!("{}() first argument must be array", name), + )); + } + let dummy_args = vec![dummy_arg_for_array_scalar_elem(&arr_ty, span)]; + check_callback_builtin_call( + checker, + &args[1], + &dummy_args, + span, + env, + &format!("{}() callback", name), + )?; + // array_find returns the matching element or null (Mixed); any/all return bool. + if name == "array_find" { + Ok(Some(PhpType::Mixed)) + } else { + Ok(Some(PhpType::Bool)) + } + } + "array_udiff" | "array_uintersect" => { + if args.len() != 3 { + return Err(CompileError::new( + span, + &format!("{}() takes exactly 3 arguments", name), + )); + } + for arg in args { + checker.infer_type(arg, env)?; + } + let arr_ty = checker.infer_type(&args[0], env)?; + if !matches!(arr_ty, PhpType::Array(_)) { + return Err(CompileError::new( + span, + &format!("{}() first argument must be array", name), + )); + } + let cmp_arg = dummy_arg_for_array_scalar_elem(&arr_ty, span); + let dummy_args = vec![cmp_arg.clone(), cmp_arg]; + check_callback_builtin_call( + checker, + &args[2], + &dummy_args, + span, + env, + &format!("{}() comparator", name), + )?; + Ok(Some(arr_ty)) + } "array_reduce" => { if args.len() != 3 { return Err(CompileError::new( @@ -859,9 +921,12 @@ pub(super) fn check_builtin( )?; Ok(Some(PhpType::Int)) } - "array_walk" => { + "array_walk" | "array_walk_recursive" => { if args.len() != 2 { - return Err(CompileError::new(span, "array_walk() takes exactly 2 arguments")); + return Err(CompileError::new( + span, + &format!("{}() takes exactly 2 arguments", name), + )); } for arg in args { checker.infer_type(arg, env)?; @@ -874,7 +939,7 @@ pub(super) fn check_builtin( &dummy_args, span, env, - "array_walk() callback", + &format!("{}() callback", name), )?; Ok(Some(PhpType::Void)) } diff --git a/src/types/checker/builtins/catalog.rs b/src/types/checker/builtins/catalog.rs index 03b34c220..7c80cf100 100644 --- a/src/types/checker/builtins/catalog.rs +++ b/src/types/checker/builtins/catalog.rs @@ -13,37 +13,52 @@ const SUPPORTED_BUILTIN_FUNCTIONS: &[&str] = &[ "abs", "acos", "addslashes", + "array_all", + "array_any", "array_chunk", "array_column", "array_combine", "array_diff", + "array_diff_assoc", "array_diff_key", "array_fill", "array_fill_keys", "array_filter", + "array_find", "array_flip", "array_intersect", + "array_intersect_assoc", "array_intersect_key", + "array_is_list", "array_key_exists", + "array_key_first", + "array_key_last", "array_keys", "array_map", "array_merge", + "array_merge_recursive", + "array_multisort", "array_pad", "array_pop", "array_product", "array_push", "array_rand", "array_reduce", + "array_replace", + "array_replace_recursive", "array_reverse", "array_search", "array_shift", "array_slice", "array_splice", "array_sum", + "array_udiff", + "array_uintersect", "array_unique", "array_unshift", "array_values", "array_walk", + "array_walk_recursive", "arsort", "asin", "asort", diff --git a/src/types/model.rs b/src/types/model.rs index e2f657446..1a503683e 100644 --- a/src/types/model.rs +++ b/src/types/model.rs @@ -44,6 +44,76 @@ impl PhpType { PhpType::Resource(Some("stream".to_string())) } + /// Returns this array type viewed as a hash (associative array). + /// + /// Indexed arrays (`Array(elem)`) become `AssocArray { key: Int, value: elem }`: the + /// hash-based builtins (`array_replace`, `array_diff_assoc`, ...) always produce an + /// integer-keyed hash from an indexed input. Associative arrays and every other type are + /// returned unchanged. + pub fn as_hash(&self) -> PhpType { + match self { + PhpType::Array(elem) => PhpType::AssocArray { + key: Box::new(PhpType::Int), + value: elem.clone(), + }, + other => other.clone(), + } + } + + /// Returns true if this is an indexed array of a scalar (int/float/bool) element type. + /// + /// The hash-based builtins accept such indexed inputs by converting them to integer-keyed + /// hashes; scalar elements are copied by value, so the converted temporaries are safe to + /// free. String/heap element indexed inputs are a follow-up (they hit x86-specific converter + /// and clone-shallow issues), so the checker restricts indexed inputs to scalar elements. + pub fn is_scalar_indexed_array(&self) -> bool { + matches!( + self, + PhpType::Array(elem) + if matches!(**elem, PhpType::Int | PhpType::Float | PhpType::Bool) + ) + } + + /// Returns the hash key type this array type contributes: `Int` for an indexed array, + /// the declared key for an associative array, `Int` otherwise. + pub fn hash_key_type(&self) -> PhpType { + match self { + PhpType::Array(_) => PhpType::Int, + PhpType::AssocArray { key, .. } => (**key).clone(), + _ => PhpType::Int, + } + } + + /// Returns the hash value type this array type contributes: the element type for an indexed + /// array, the declared value for an associative array, `Mixed` otherwise. + pub fn hash_value_type(&self) -> PhpType { + match self { + PhpType::Array(elem) => (**elem).clone(), + PhpType::AssocArray { value, .. } => (**value).clone(), + _ => PhpType::Mixed, + } + } + + /// Widens two types to a common type: the type itself when both agree, else `Mixed`. + pub fn widen(a: PhpType, b: PhpType) -> PhpType { + if a == b { + a + } else { + PhpType::Mixed + } + } + + /// Computes the result hash type for a two-input hash builtin (the `array_replace` / + /// `array_diff_assoc` family). The key and value each widen to `Mixed` when the two inputs + /// disagree, so a `foreach` over the result performs the correct runtime key/value dispatch + /// when an indexed input is mixed with a string-keyed associative input. + pub fn two_input_hash_result(t1: &PhpType, t2: &PhpType) -> PhpType { + PhpType::AssocArray { + key: Box::new(PhpType::widen(t1.hash_key_type(), t2.hash_key_type())), + value: Box::new(PhpType::widen(t1.hash_value_type(), t2.hash_value_type())), + } + } + /// Returns true if `expected` is compatible with `actual` for resource type matching. /// A typed resource (Some) is compatible with a generic resource (None), and two typed /// resources are compatible when their kind strings match. diff --git a/src/types/signatures.rs b/src/types/signatures.rs index a5865cbbb..9e0b47444 100644 --- a/src/types/signatures.rs +++ b/src/types/signatures.rs @@ -232,7 +232,8 @@ pub(crate) fn builtin_call_sig(name: &str) -> Option { "array_pop" | "array_shift" => Some(first_param_ref(fixed(&["array"]))), "array_keys" | "array_values" | "array_reverse" | "array_unique" | "array_flip" - | "array_sum" | "array_product" | "array_rand" => Some(fixed(&["array"])), + | "array_sum" | "array_product" | "array_rand" | "array_is_list" + | "array_key_first" | "array_key_last" => Some(fixed(&["array"])), "sort" | "rsort" | "shuffle" | "natsort" | "natcasesort" | "asort" | "arsort" | "ksort" | "krsort" => Some(first_param_ref(fixed(&["array"]))), "in_array" => Some(optional(&["needle", "haystack", "strict"], 2, vec![bool_lit(false)])), @@ -241,10 +242,14 @@ pub(crate) fn builtin_call_sig(name: &str) -> Option { Some(optional(&["needle", "haystack", "strict"], 2, vec![bool_lit(false)])) } "array_push" | "array_unshift" => Some(first_param_ref(variadic(&["array"], "values"))), - "array_merge" => Some(variadic(&[], "arrays")), - "array_diff" | "array_intersect" | "array_diff_key" | "array_intersect_key" => { + "array_merge" | "array_merge_recursive" => Some(variadic(&[], "arrays")), + "array_diff" | "array_intersect" | "array_diff_key" | "array_intersect_key" + | "array_diff_assoc" | "array_intersect_assoc" => { Some(variadic(&["array"], "arrays")) } + "array_replace" | "array_replace_recursive" => { + Some(fixed(&["array", "replacements"])) + } "array_combine" => Some(fixed(&["keys", "values"])), "array_fill_keys" => Some(fixed(&["keys", "value"])), "array_pad" => Some(fixed(&["array", "length", "value"])), @@ -268,12 +273,21 @@ pub(crate) fn builtin_call_sig(name: &str) -> Option { 1, vec![null_lit(), int_lit(0)], )), + "array_find" | "array_any" | "array_all" => Some(fixed(&["array", "callback"])), + "array_udiff" | "array_uintersect" => { + Some(fixed(&["array1", "array2", "callback"])) + } + "array_multisort" => { + let mut sig = fixed(&["array1", "array2"]); + sig.ref_params = vec![true, true]; + Some(sig) + } "array_reduce" => Some(optional( &["array", "callback", "initial"], 2, vec![null_lit()], )), - "array_walk" | "usort" | "uksort" | "uasort" => { + "array_walk" | "array_walk_recursive" | "usort" | "uksort" | "uasort" => { Some(first_param_ref(fixed(&["array", "callback"]))) } "call_user_func" => Some(variadic(&["callback"], "args")), diff --git a/tests/codegen/arrays/assoc_set_ops.rs b/tests/codegen/arrays/assoc_set_ops.rs new file mode 100644 index 000000000..7255974ea --- /dev/null +++ b/tests/codegen/arrays/assoc_set_ops.rs @@ -0,0 +1,433 @@ +//! Purpose: +//! Integration tests for associative-array set/merge builtins: `array_replace`. +//! +//! Called from: +//! - `cargo test` through Rust's test harness. +//! +//! Key details: +//! - Inline PHP fixtures are compiled to native binaries; assertions compare stdout. +//! - Covers in-place overwrite (keeping key position), appended keys, copy-on-write +//! non-mutation of the source, string values, and case-insensitive calls. + +use crate::support::*; + +/// Verifies array_replace() overwrites matching keys in place and appends new keys, +/// preserving the first array's key order. +/// Fixture: base {a:1,b:2} replaced by {b:9,c:3} → a=1;b=9;c=3 in that order. +#[test] +fn test_array_replace_overwrite_and_append() { + let out = compile_and_run( + r#" 1, "b" => 2]; +$over = ["b" => 9, "c" => 3]; +$r = array_replace($base, $over); +foreach ($r as $k => $v) { echo "$k=$v;"; } +"#, + ); + assert_eq!(out, "a=1;b=9;c=3;"); +} + +/// Verifies array_replace() does not mutate the source array (copy-on-write). +/// Fixture: base {x:1} replaced by {x:5}; base["x"] stays 1, result["x"] is 5. +#[test] +fn test_array_replace_source_unchanged() { + let out = compile_and_run( + r#" 1]; +$over = ["x" => 5]; +$r = array_replace($base, $over); +echo $base["x"]; +echo $r["x"]; +"#, + ); + assert_eq!(out, "15"); +} + +/// Verifies array_replace() carries string values from both arrays into the result. +/// Fixture: base {name:"alice",role:"user"} replaced by {role:"admin"}. +#[test] +fn test_array_replace_string_values() { + let out = compile_and_run( + r#" "alice", "role" => "user"]; +$over = ["role" => "admin"]; +$r = array_replace($base, $over); +echo $r["name"]; +echo "-"; +echo $r["role"]; +"#, + ); + assert_eq!(out, "alice-admin"); +} + +/// Verifies array_replace() result count reflects merged distinct keys. +/// Fixture: {a:1,b:2} replaced by {b:9,c:3,d:4} → 4 distinct keys. +#[test] +fn test_array_replace_count() { + let out = compile_and_run( + r#" 1, "b" => 2], ["b" => 9, "c" => 3, "d" => 4]); +echo count($r); +"#, + ); + assert_eq!(out, "4"); +} + +/// Verifies array_replace() is callable case-insensitively, matching PHP builtin name rules. +/// Fixture: mixed-case spelling overwriting one key. +#[test] +fn test_array_replace_case_insensitive() { + let out = compile_and_run( + r#" 1], ["a" => 2]); +echo $r["a"]; +"#, + ); + assert_eq!(out, "2"); +} + +/// Verifies array_diff_assoc() keeps entries whose (key, value) pair is absent from the +/// second array: matching pair dropped, differing value kept, missing key kept. +/// Fixture: {a:1,b:2,c:3} vs {a:1,b:5} → b=2 (value differs) and c=3 (key absent) remain. +#[test] +fn test_array_diff_assoc_basic() { + let out = compile_and_run( + r#" 1, "b" => 2, "c" => 3]; +$b = ["a" => 1, "b" => 5]; +$r = array_diff_assoc($a, $b); +foreach ($r as $k => $v) { echo "$k=$v;"; } +"#, + ); + assert_eq!(out, "b=2;c=3;"); +} + +/// Verifies array_diff_assoc() compares values by PHP string cast: integer 5 and string "5" +/// are equal, so the matching pair is dropped. +/// Fixture: {x:5} vs {x:"5"} → empty result (count 0). +#[test] +fn test_array_diff_assoc_string_cast_equality() { + let out = compile_and_run( + r#" 5], ["x" => "5"]); +echo count($r); +"#, + ); + assert_eq!(out, "0"); +} + +/// Verifies array_intersect_assoc() keeps only entries whose (key, value) pair appears in +/// the second array. +/// Fixture: {a:1,b:2,c:3} vs {a:1,b:5} → only a=1 matches. +#[test] +fn test_array_intersect_assoc_basic() { + let out = compile_and_run( + r#" 1, "b" => 2, "c" => 3]; +$b = ["a" => 1, "b" => 5]; +$r = array_intersect_assoc($a, $b); +foreach ($r as $k => $v) { echo "$k=$v;"; } +"#, + ); + assert_eq!(out, "a=1;"); +} + +/// Verifies array_intersect_assoc() over string values keeps matching key+value pairs and +/// drops differing ones; exercises string-value retain plus temporary-box release. +/// Fixture: {name:"alice",role:"user"} vs {name:"alice",role:"admin"} → only name kept. +#[test] +fn test_array_intersect_assoc_string_values() { + let out = compile_and_run( + r#" "alice", "role" => "user"]; +$b = ["name" => "alice", "role" => "admin"]; +$r = array_intersect_assoc($a, $b); +foreach ($r as $k => $v) { echo "$k=$v;"; } +"#, + ); + assert_eq!(out, "name=alice;"); +} + +/// Verifies array_diff_assoc() and array_intersect_assoc() are callable case-insensitively. +/// Fixture: mixed-case spellings over the same fixtures. +#[test] +fn test_assoc_diff_intersect_case_insensitive() { + let out = compile_and_run( + r#" 1, "b" => 2], ["a" => 1]); +echo count($d); +$i = Array_Intersect_Assoc(["a" => 1, "b" => 2], ["a" => 1]); +echo count($i); +"#, + ); + assert_eq!(out, "11"); +} + +/// Verifies array_replace_recursive() merges nested associative arrays key-by-key instead +/// of overwriting them wholesale. +/// Fixture: {cfg:{x:1,y:2}} replaced by {cfg:{y:9,z:3}} → cfg = {x:1,y:9,z:3}. +#[test] +fn test_array_replace_recursive_nested_merge() { + let out = compile_and_run( + r#" ["x" => 1, "y" => 2]]; +$over = ["cfg" => ["y" => 9, "z" => 3]]; +$r = array_replace_recursive($base, $over); +$c = $r["cfg"]; +echo $c["x"]; +echo $c["y"]; +echo $c["z"]; +"#, + ); + assert_eq!(out, "193"); +} + +/// Verifies array_replace_recursive() overwrites non-array values like array_replace. +/// Fixture: {a:1,b:2} replaced by {b:9} → a kept, b overwritten. +#[test] +fn test_array_replace_recursive_scalar_overwrite() { + let out = compile_and_run( + r#" 1, "b" => 2], ["b" => 9]); +echo $r["a"]; +echo $r["b"]; +"#, + ); + assert_eq!(out, "19"); +} + +/// Verifies array_replace_recursive() leaves the source arrays (and their nested arrays) +/// unchanged (copy-on-write through the recursive clone). +/// Fixture: nested {cfg:{x:1}} replaced by {cfg:{x:5}}; base nested x stays 1. +#[test] +fn test_array_replace_recursive_source_unchanged() { + let out = compile_and_run( + r#" ["x" => 1]]; +$over = ["cfg" => ["x" => 5]]; +$r = array_replace_recursive($base, $over); +$bc = $base["cfg"]; +$rc = $r["cfg"]; +echo $bc["x"]; +echo $rc["x"]; +"#, + ); + assert_eq!(out, "15"); +} + +/// Verifies array_merge_recursive() recursively merges nested associative arrays sharing a key. +/// Fixture: {cfg:{a:1}} + {cfg:{b:2}} → cfg = {a:1,b:2}. +#[test] +fn test_array_merge_recursive_nested_merge() { + let out = compile_and_run( + r#" ["a" => 1]], ["cfg" => ["b" => 2]]); +foreach ($r["cfg"] as $k => $v) { echo "$k=$v;"; } +"#, + ); + assert_eq!(out, "a=1;b=2;"); +} + +/// Verifies array_merge_recursive() combines two scalar values at a colliding string key +/// into a renumbered list. +/// Fixture: {k:1} + {k:2} → k = [0=>1, 1=>2]. +#[test] +fn test_array_merge_recursive_scalar_combine() { + let out = compile_and_run( + r#" 1], ["k" => 2]); +echo count($r["k"]); +echo ":"; +foreach ($r["k"] as $k => $v) { echo "$k=$v;"; } +"#, + ); + assert_eq!(out, "2:0=1;1=2;"); +} + +/// Verifies array_merge_recursive() keeps non-colliding string keys from both arrays. +/// Fixture: {a:1} + {b:2} → {a:1, b:2}. +#[test] +fn test_array_merge_recursive_no_collision() { + let out = compile_and_run( + r#" 1], ["b" => 2]); +foreach ($r as $k => $v) { echo "$k=$v;"; } +"#, + ); + assert_eq!(out, "a=1;b=2;"); +} + +/// Verifies array_merge_recursive() renumbers integer keys sequentially across both arrays. +/// Fixture: {5:10} + {9:20} → {0:10, 1:20}. +#[test] +fn test_array_merge_recursive_int_keys_renumber() { + let out = compile_and_run( + r#" 10], [9 => 20]); +foreach ($r as $k => $v) { echo "$k=$v;"; } +"#, + ); + assert_eq!(out, "0=10;1=20;"); +} + +/// Verifies array_merge_recursive() string-scalar collisions combine into a list with the +/// correct string values preserved (persisted independently of the temporary wrappers). +/// Fixture: {k:"ab"} + {k:"cd"} → k = ["ab", "cd"]. +#[test] +fn test_array_merge_recursive_string_combine() { + let out = compile_and_run( + r#" "ab"], ["k" => "cd"]); +echo count($r["k"]); +echo ":"; +foreach ($r["k"] as $v) { echo $v; echo ","; } +"#, + ); + assert_eq!(out, "2:ab,cd,"); +} + +/// Verifies array_merge_recursive() is callable case-insensitively, matching PHP builtin name rules. +/// Fixture: mixed-case spelling combining two scalar values. +#[test] +fn test_array_merge_recursive_case_insensitive() { + let out = compile_and_run( + r#" 1], ["k" => 2]); +echo count($r["k"]); +"#, + ); + assert_eq!(out, "2"); +} + +/// Verifies array_multisort() sorts the first array ascending and reorders the second in tandem, +/// mutating both arrays in place (by reference). +/// Fixture: [3,1,2] + [30,10,20] → [1,2,3] + [10,20,30]. +#[test] +fn test_array_multisort_parallel() { + let out = compile_and_run( + r#" 99]); +foreach ($r as $k => $v) { echo "$k:$v "; } +"#, + ); + assert_eq!(out, "0:10 1:99 2:30 "); +} + +/// Verifies array_replace() with an indexed first array and an associative second array that +/// shares the integer key space (homogeneous keys) overwrites and appends correctly. +/// Fixture: [1,2,3] replaced by {1:99, 3:40} → 0:1 1:99 2:3 3:40. +#[test] +fn test_array_replace_indexed_then_int_keys() { + let out = compile_and_run( + r#" 99, 3 => 40]); +foreach ($r as $k => $v) { echo "$k:$v "; } +"#, + ); + assert_eq!(out, "0:1 1:99 2:3 3:40 "); +} + +/// Verifies array_replace_recursive() accepts scalar indexed inputs (converted to hashes). +/// Fixture: [1,2,3] replaced by {1:9} → 0:1 1:9 2:3. +#[test] +fn test_array_replace_recursive_indexed_inputs() { + let out = compile_and_run( + r#" 9]); +foreach ($r as $k => $v) { echo "$k:$v "; } +"#, + ); + assert_eq!(out, "0:1 1:9 2:3 "); +} + +/// Verifies array_diff_assoc() accepts indexed inputs, keeping first-array entries whose +/// (key, value) pair is absent from the second array. +/// Fixture: [1,2,3] minus [1,5] → 1:2 2:3 (index 0 matches value 1, index 1 differs). +#[test] +fn test_array_diff_assoc_indexed_inputs() { + let out = compile_and_run( + r#" $v) { echo "$k:$v "; } +"#, + ); + assert_eq!(out, "1:2 2:3 "); +} + +/// Verifies array_intersect_assoc() accepts indexed inputs, keeping first-array entries whose +/// (key, value) pair is present in the second array. +/// Fixture: [1,2,3] intersect [1,5] → 0:1 (only index 0 matches by key and value). +#[test] +fn test_array_intersect_assoc_indexed_inputs() { + let out = compile_and_run( + r#" $v) { echo "$k:$v "; } +"#, + ); + assert_eq!(out, "0:1 "); +} + +/// Verifies array_merge_recursive() accepts indexed inputs, appending and renumbering integer keys. +/// Fixture: [1,2] merged with [3,4] → 0:1 1:2 2:3 3:4. +#[test] +fn test_array_merge_recursive_indexed_inputs() { + let out = compile_and_run( + r#" $v) { echo "$k:$v "; } +"#, + ); + assert_eq!(out, "0:1 1:2 2:3 3:4 "); +} diff --git a/tests/codegen/arrays/callbacks.rs b/tests/codegen/arrays/callbacks.rs index df9e02c62..8dbf2bda4 100644 --- a/tests/codegen/arrays/callbacks.rs +++ b/tests/codegen/arrays/callbacks.rs @@ -303,6 +303,204 @@ array_walk($a, "show"); assert_eq!(out, "102030"); } +/// Verifies array_walk_recursive() visits the leaf values of a nested indexed array +/// depth-first, invoking the callback on each scalar leaf. +/// Fixture: [[1,2],[3,4]] → leaves 1,2,3,4 in order. +#[test] +fn test_array_walk_recursive_indexed() { + let out = compile_and_run( + r#" ["a" => 1, "b" => 2], "g2" => ["c" => 3]]; +array_walk_recursive($a, "p"); +"#, + ); + assert_eq!(out, "1;2;3;"); +} + +/// Verifies array_walk_recursive() recurses through three levels of nesting. +/// Fixture: [[[1,2]],[[3]]] → leaves 1,2,3. +#[test] +fn test_array_walk_recursive_deep() { + let out = compile_and_run( + r#" 2 in [1,2,3,4] is 3. +#[test] +fn test_array_find_returns_first_match() { + let out = compile_and_run( + r#" 2; } +echo array_find([1, 2, 3, 4], "gt2"); +"#, + ); + assert_eq!(out, "3"); +} + +/// Verifies array_find() returns null when no element satisfies the predicate. +/// Fixture: no element > 2 in [1,2]; the boxed null compares equal to null. +#[test] +fn test_array_find_returns_null_when_absent() { + let out = compile_and_run( + r#" 2; } +$r = array_find([1, 2], "gt2"); +echo ($r === null) ? "null" : "value"; +"#, + ); + assert_eq!(out, "null"); +} + +/// Verifies array_find() works with a closure predicate. +/// Fixture: first element >= 10 in [5,10,15] is 10. +#[test] +fn test_array_find_closure() { + let out = compile_and_run( + r#" $x >= 10); +echo $r; +"#, + ); + assert_eq!(out, "10"); +} + +/// Verifies array_any() (PHP 8.4) returns true iff some element satisfies the predicate. +/// Fixture: [1,2,3] has an element > 2 (true); [1,2] does not (false). +#[test] +fn test_array_any() { + let out = compile_and_run( + r#" 2; } +echo array_any([1, 2, 3], "gt2") ? "y" : "n"; +echo array_any([1, 2], "gt2") ? "y" : "n"; +"#, + ); + assert_eq!(out, "yn"); +} + +/// Verifies array_all() (PHP 8.4) returns true iff every element satisfies the predicate. +/// Fixture: all of [1,2,3] are positive (true); [1,-2,3] is not all positive (false). +#[test] +fn test_array_all() { + let out = compile_and_run( + r#" 0; } +echo array_all([1, 2, 3], "pos") ? "y" : "n"; +echo array_all([1, -2, 3], "pos") ? "y" : "n"; +"#, + ); + assert_eq!(out, "yn"); +} + +/// Verifies array_find / array_any / array_all are callable case-insensitively. +/// Fixture: mixed-case spellings over a small numeric array. +#[test] +fn test_array_find_any_all_case_insensitive() { + let out = compile_and_run( + r#" 0; } +echo Array_Find([3, 6], "pos"); +echo Array_Any([0, 0], "pos") ? "y" : "n"; +echo Array_All([1, 2], "pos") ? "y" : "n"; +"#, + ); + assert_eq!(out, "3ny"); +} + +/// Verifies array_udiff() keeps elements of the first array whose comparator never returns 0 +/// against any element of the second array. +/// Fixture: udiff([1,2,3,4], [2,4]) with a numeric comparator keeps 1 and 3. +#[test] +fn test_array_udiff_string_comparator() { + let out = compile_and_run( + r#" $a <=> $b); +foreach ($r as $v) { echo $v; echo ","; } +"#, + ); + assert_eq!(out, "5,15,"); +} + +/// Verifies array_udiff() / array_uintersect() result sizes and case-insensitive calls. +/// Fixture: udiff keeps 2 of 4; uintersect keeps 2 of 4. +#[test] +fn test_array_udiff_uintersect_case_insensitive() { + let out = compile_and_run( + r#" 1, "b" => 2]) ? "y" : "n"; +echo array_is_list([5 => "x", 6 => "y"]) ? "y" : "n"; +echo array_is_list([]) ? "y" : "n"; +"#, + ); + assert_eq!(out, "ynny"); +} + +/// Verifies array_is_list() walks a hash produced by json_decode($s, true): a JSON array +/// decodes to a list-shaped hash (true), a JSON object decodes to a string-keyed hash (false). +/// Fixture: json_decode of a numeric array and of an object, both as associative. +#[test] +fn test_array_is_list_runtime_hash() { + let out = compile_and_run( + r#" 1, "y" => 2, "z" => 3]; +echo array_key_first($m); +echo array_key_last($m); +"#, + ); + assert_eq!(out, "xz"); +} + +/// Verifies array_key_first()/array_key_last() return integer keys from an out-of-order hash. +/// Fixture: an integer-keyed associative array inserted as 3, 1, 7; first 3, last 7. +#[test] +fn test_array_key_edge_assoc_int() { + let out = compile_and_run( + r#" "a", 1 => "b", 7 => "c"]; +echo array_key_first($m); +echo array_key_last($m); +"#, + ); + assert_eq!(out, "37"); +} + +/// Verifies array_key_first()/array_key_last() return null for an empty array. +/// Fixture: an empty array literal compared strictly against null. +#[test] +fn test_array_key_edge_empty_is_null() { + let out = compile_and_run( + r#" 1]; array_replace($a);", + "array_replace() takes exactly 2 arguments", + ); +} + +/// Verifies that array_replace() rejects string-element indexed arrays (scalar indexed inputs +/// are supported; string/heap element indexed inputs are a follow-up). +#[test] +fn test_error_array_replace_string_indexed_unsupported() { + expect_error( + " 1]; array_diff_assoc($a);", + "array_diff_assoc() takes exactly 2 arguments", + ); +} + +/// Verifies that array_intersect_assoc() rejects string-element indexed arrays (scalar indexed +/// inputs are supported; string/heap element indexed inputs are a follow-up). +#[test] +fn test_error_array_intersect_assoc_string_indexed_unsupported() { + expect_error( + " 1]; array_replace_recursive($a);", + "array_replace_recursive() takes exactly 2 arguments", + ); +} + +/// Verifies that array_replace_recursive() rejects string-element indexed arrays (scalar indexed +/// inputs are supported; string/heap element indexed inputs are a follow-up). +#[test] +fn test_error_array_replace_recursive_string_indexed_unsupported() { + expect_error( + " 1]; array_merge_recursive($a);", + "array_merge_recursive() takes exactly 2 arguments", + ); +} + +/// Verifies that array_merge_recursive() rejects string-element indexed arrays (scalar indexed +/// inputs are supported; string/heap element indexed inputs are a follow-up). +#[test] +fn test_error_array_merge_recursive_string_indexed_unsupported() { + expect_error( + "