Clean up string conversion semantics#54
Conversation
There was a problem hiding this comment.
Pull request overview
This PR tightens Ferret’s string conversion semantics by introducing a Stringable constraint and a new to_str<T: Stringable>(value) formatting primitive, while redefining str ↔ []u8 casts as zero-copy view reshapes and rejecting allocating casts (e.g. str as []char, []char as str). It updates the typechecker, MIR lowering, LLVM backend, standard library declarations, tests, and language semantics documentation to match these rules.
Changes:
- Add
Stringableas a predeclared constraint and enforce it in the typechecker (including a newto_strconversion path). - Change cast rules: only allow
str↔ readonly[]u8as view casts; requireto_strfor numeric and[]charconversions. - Update stdlib/global declarations, backend lowering, and tests/docs to reflect the new semantics.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/semma/string_realworld_ok.fer | Switch str -> []char cast to explicit str_chars(&s) helper. |
| tests/semma/string_index.fer | Replace s as []char with str_chars(&s) under new rules. |
| tests/semma/str_ready.fer | Update []char -> str and str -> []char paths to to_str / str_chars. |
| tests/semma/io_file_write.fer | Replace numeric as str with to_str(...). |
| tests/semma/io_buffer.fer | Add parentheses around as str comparisons (parse/precedence clarity). |
| tests/repro/tcp_listener_echo_probe.fer | Add parentheses around data as str comparison. |
| tests/repro/tcp_conn_controls_probe.fer | Add parentheses around data as str comparison. |
| tests/repro/tcp_client_echo_probe.fer | Add parentheses around data as str comparison. |
| tests/repro/map_index_probe.fer | Replace i32 as str with to_str(...) on map indexing output. |
| tests/repro/map_index_missing_panic.fer | Replace i32 as str with to_str(...) for missing-key panic path. |
| tests/repro/map_field_mut_probe.fer | Replace usize as str with to_str(...) for size printing. |
| tests/repro/map_builtin_probe.fer | Replace multiple numeric as str conversions with to_str(...). |
| internal/ir/mir/simplify_test.go | Update embedded Ferret test snippets to use to_str. |
| internal/ir/mir/lower.go | Add MIR lowering for builtin to_str calls (numeric / []char / method coercion). |
| internal/backend/llvm/lower_test.go | Update LLVM lowering tests to assert no allocating string conversions are emitted. |
| internal/backend/llvm/llvm.go | Remove special string/numeric cast lowering in favor of new semantics and MIR lowering. |
| internal/analysis/semantics/typeinfo/types.go | Introduce StringableConstraint and equality support. |
| internal/analysis/semantics/typechecker/typechecker_test.go | Add/adjust tests for rejecting numeric casts, allowing view casts, and validating to_str. |
| internal/analysis/semantics/typechecker/typechecker.go | Enforce Stringable constraints and restrict explicit string casts to view-only []u8 reshapes. |
| internal/analysis/semantics/typechecker/syntax_types.go | Parse Stringable as a predeclared constraint type and refactor unqualified-name handling. |
| internal/analysis/semantics/resolver/resolver.go | Treat Stringable as a predeclared type for resolution. |
| ferret_libs_dev/std/net/http.fer | Replace numeric as str formatting with to_str when writing responses. |
| ferret_libs_dev/global.fer | Add extern generic to_str and update string helper FFI signatures to use references. |
| dry_test.fer | Replace numeric as str with to_str in Point::String. |
| CURRENT_SEMANTICS.md | Document view-cast semantics and the new to_str/Stringable conversion model. |
| - `[]u8 as str` produces a trusted text view with the same `{ptr, len}` backing storage | ||
| - explicit copy/view materialization helpers (`str_bytes`, `bytes_str`, `str_chars`) remain available |
There was a problem hiding this comment.
[]u8 as str is documented as producing a “trusted text view”, but it’s not explicit whether any UTF-8 validation occurs (and if not, that downstream string operations will assume validity). Consider clarifying this bullet (and/or referencing the allocating helper) to make the safety contract unambiguous.
| - `[]u8 as str` produces a trusted text view with the same `{ptr, len}` backing storage | |
| - explicit copy/view materialization helpers (`str_bytes`, `bytes_str`, `str_chars`) remain available | |
| - `[]u8 as str` produces a trusted text view with the same `{ptr, len}` backing storage and does not perform UTF-8 validation; it is only valid when the source bytes are already valid UTF-8, and downstream `str` operations assume that invariant | |
| - explicit copy/view materialization helpers (`str_bytes`, `bytes_str`, `str_chars`) remain available; use `bytes_str` when you want the byte-to-string conversion path to be spelled explicitly rather than relying on `[]u8 as str` |
| Args: []Value{&AddrOfValue{ | ||
| baseValue: baseValue{Location: arg.Loc(), ExprType: &typeinfo.RefType{Inner: argType}}, | ||
| Source: lowerAddrSource(c, arg), | ||
| }}, |
There was a problem hiding this comment.
lowerBuiltinToStrCallValue lowers to_str([]char) by taking &arg via AddrOfValue{Source: lowerAddrSource(c, arg)}. This fails for non-addressable slice expressions (e.g. to_str(str_chars(&s)) or any call/temporary returning []char), because the LLVM backend lowerAddrOf only supports locals/names/places and will reject AddrOfValue whose Source is a call/composite. Consider either (a) passing the slice value directly to the runtime helper (adjusting ABI to accept byval), or (b) materializing the slice into a temp local when it isn’t addressable, then taking the temp’s address.
| Args: []Value{&AddrOfValue{ | |
| baseValue: baseValue{Location: arg.Loc(), ExprType: &typeinfo.RefType{Inner: argType}}, | |
| Source: lowerAddrSource(c, arg), | |
| }}, | |
| Args: []Value{lowerValue(c, arg)}, |
| - explicit copy/view materialization helpers (`str_bytes`, `bytes_str`, `str_chars`) remain available | ||
| - direct allocating casts such as `str as []char` and `[]char as str` are rejected | ||
| - numeric formatting and `[]char` encoding use `to_str<T: Stringable>(value)`, not `as str` | ||
| - types that provide `String() -> str` can use `value as str` |
There was a problem hiding this comment.
The semantics doc says “types that provide String() -> str can use value as str”, but the language’s string-method coercion uses a receiver form (e.g. fn Name::String(self) -> str). Please update the documentation to reflect the actual required signature (include the self receiver) so users don’t implement an incorrect method shape.
| - types that provide `String() -> str` can use `value as str` | |
| - types that provide `fn Name::String(self) -> str` can use `value as str` |
…anup # Conflicts: # dry_test.fer
| func (c *checker) isStringableType(typ typeinfo.Type) bool { | ||
| return c.isStringType(typ) || | ||
| typeinfo.IsNumeric(c.underlying(typ)) || | ||
| c.isCharSliceType(typ) || | ||
| c.isStringMethodCoercion(&typeinfo.StringType{}, typ) | ||
| } |
There was a problem hiding this comment.
isStringableType doesn’t unwrap ApproxType ("~T") before checking IsNumeric, so values typed as e.g. ~i32/~u64 won’t satisfy the new Stringable constraint even though Comparable and other type logic treat ApproxType as a transparent wrapper. Consider handling *typeinfo.ApproxType (and delegating to its Inner) so to_str works consistently for approximate numeric types.
| if c != nil { | ||
| if family, _, ok := typeinfo.NumericInfo(argType); ok { | ||
| targetType := typeinfo.Type(&typeinfo.BuiltinType{Name: "i64"}) | ||
| linkName := "ferret_global_i64_str" |
There was a problem hiding this comment.
lowerBuiltinToStrCallValue uses typeinfo.NumericInfo(argType) directly. If argType can be wrapped (e.g. ApproxType) or otherwise not a plain *BuiltinType, this will fail to lower to_str calls and may leave an unresolved extern to_str call in MIR/LLVM. Consider unwrapping to the underlying numeric type before calling NumericInfo (matching the typechecker’s notion of numeric/stringable).
|
|
||
| // Converts Stringable values to text when formatting or encoding is required. | ||
| // Numeric types, str, []char, and types with String(self) -> str satisfy Stringable. | ||
| // Use `as` for zero-copy reshapes. |
There was a problem hiding this comment.
The to_str docs here recommend using as for zero-copy reshapes, but don’t mention that []u8 as str is a trusted view that does not validate UTF-8 (per CURRENT_SEMANTICS). Consider adding an explicit warning in this comment block so callers don’t assume the cast is validating/safe for arbitrary bytes.
| // Use `as` for zero-copy reshapes. | |
| // Use `as` for zero-copy reshapes. | |
| // Warning: `[]u8 as str` creates a trusted view and does not validate UTF-8. | |
| // Cast arbitrary bytes only when they are already known to be valid UTF-8. |
Unwrap ApproxType when checking Stringable and when lowering builtin to_str so numeric approximations use underlying integer type. Add unit tests for typechecker and LLVM lowering. Add warning in global.fer doc comment about unsafe []u8 as str casts.
This pull request introduces a new
Stringableconstraint to the type system, refines how string and byte slice casts work, and updates both the language semantics and standard library helpers accordingly. The changes enforce stricter rules for string conversions, introduce a genericto_strformatting function, and update the typechecker and backend to support these semantics. Several tests are added and updated to verify the new behaviors.Language semantics and type system improvements:
Stringableconstraint type, allowing generic formatting of types that can be converted to strings (including numeric types,str,[]char, and types with aString() -> strmethod). The typechecker now recognizes and enforces this constraint. [1] [2] [3] [4] [5]str as []u8and[]u8 as strare now zero-copy view casts, while allocating casts (such asstr as []charor[]char as str) are rejected. Numeric formatting and[]charencoding now requireto_str<T: Stringable>(value), not direct casts. [1] [2] [3]Standard library and API changes:
to_str<T: Stringable>(value: T) -> strfunction for formatting. Updated standard library and user code to useto_strinstead of direct casts for numeric/string conversions. [1] [2] [3] [4]&str,&[]u8,&[]char) instead of pointers, reflecting the new ABI and view semantics. [1] [2]Typechecker and backend updates:
Stringableconstraint, enforce the new casting rules, and reject invalid or allocating string casts. [1] [2] [3] [4]Tests and documentation:
to_strconversions, and the new constraint logic. [1] [2] [3] [4]CURRENT_SEMANTICS.md) to describe the new casting and formatting rules.These changes make string formatting and casting safer and more explicit, and provide a more robust foundation for generic code working with string-like types.