Skip to content

Clean up string conversion semantics#54

Merged
itsfuad merged 3 commits into
mainfrom
string-conversion-cleanup
Apr 22, 2026
Merged

Clean up string conversion semantics#54
itsfuad merged 3 commits into
mainfrom
string-conversion-cleanup

Conversation

@itsfuad

@itsfuad itsfuad commented Apr 22, 2026

Copy link
Copy Markdown
Member

This pull request introduces a new Stringable constraint to the type system, refines how string and byte slice casts work, and updates both the language semantics and standard library helpers accordingly. The changes enforce stricter rules for string conversions, introduce a generic to_str formatting function, and update the typechecker and backend to support these semantics. Several tests are added and updated to verify the new behaviors.

Language semantics and type system improvements:

  • Added a new Stringable constraint type, allowing generic formatting of types that can be converted to strings (including numeric types, str, []char, and types with a String() -> str method). The typechecker now recognizes and enforces this constraint. [1] [2] [3] [4] [5]
  • Updated string and byte slice casting rules: str as []u8 and []u8 as str are now zero-copy view casts, while allocating casts (such as str as []char or []char as str) are rejected. Numeric formatting and []char encoding now require to_str<T: Stringable>(value), not direct casts. [1] [2] [3]

Standard library and API changes:

  • Introduced the generic to_str<T: Stringable>(value: T) -> str function for formatting. Updated standard library and user code to use to_str instead of direct casts for numeric/string conversions. [1] [2] [3] [4]
  • Changed FFI signatures and documentation for string helpers to use references (&str, &[]u8, &[]char) instead of pointers, reflecting the new ABI and view semantics. [1] [2]

Typechecker and backend updates:

  • Refactored typechecker logic to handle the new Stringable constraint, enforce the new casting rules, and reject invalid or allocating string casts. [1] [2] [3] [4]
  • Updated the backend to remove special handling for string/numeric casts and align with the new zero-copy cast semantics. [1] [2] [3]

Tests and documentation:

  • Added and updated tests to verify allowed and rejected casts, to_str conversions, and the new constraint logic. [1] [2] [3] [4]
  • Updated language documentation (CURRENT_SEMANTICS.md) to describe the new casting and formatting rules.

These changes make string formatting and casting safer and more explicit, and provide a more robust foundation for generic code working with string-like types.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens Ferret’s string conversion semantics by introducing a Stringable constraint and a new to_str<T: Stringable>(value) formatting primitive, while redefining str[]u8 casts as zero-copy view reshapes and rejecting allocating casts (e.g. str as []char, []char as str). It updates the typechecker, MIR lowering, LLVM backend, standard library declarations, tests, and language semantics documentation to match these rules.

Changes:

  • Add Stringable as a predeclared constraint and enforce it in the typechecker (including a new to_str conversion path).
  • Change cast rules: only allow str ↔ readonly []u8 as view casts; require to_str for numeric and []char conversions.
  • Update stdlib/global declarations, backend lowering, and tests/docs to reflect the new semantics.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/semma/string_realworld_ok.fer Switch str -> []char cast to explicit str_chars(&s) helper.
tests/semma/string_index.fer Replace s as []char with str_chars(&s) under new rules.
tests/semma/str_ready.fer Update []char -> str and str -> []char paths to to_str / str_chars.
tests/semma/io_file_write.fer Replace numeric as str with to_str(...).
tests/semma/io_buffer.fer Add parentheses around as str comparisons (parse/precedence clarity).
tests/repro/tcp_listener_echo_probe.fer Add parentheses around data as str comparison.
tests/repro/tcp_conn_controls_probe.fer Add parentheses around data as str comparison.
tests/repro/tcp_client_echo_probe.fer Add parentheses around data as str comparison.
tests/repro/map_index_probe.fer Replace i32 as str with to_str(...) on map indexing output.
tests/repro/map_index_missing_panic.fer Replace i32 as str with to_str(...) for missing-key panic path.
tests/repro/map_field_mut_probe.fer Replace usize as str with to_str(...) for size printing.
tests/repro/map_builtin_probe.fer Replace multiple numeric as str conversions with to_str(...).
internal/ir/mir/simplify_test.go Update embedded Ferret test snippets to use to_str.
internal/ir/mir/lower.go Add MIR lowering for builtin to_str calls (numeric / []char / method coercion).
internal/backend/llvm/lower_test.go Update LLVM lowering tests to assert no allocating string conversions are emitted.
internal/backend/llvm/llvm.go Remove special string/numeric cast lowering in favor of new semantics and MIR lowering.
internal/analysis/semantics/typeinfo/types.go Introduce StringableConstraint and equality support.
internal/analysis/semantics/typechecker/typechecker_test.go Add/adjust tests for rejecting numeric casts, allowing view casts, and validating to_str.
internal/analysis/semantics/typechecker/typechecker.go Enforce Stringable constraints and restrict explicit string casts to view-only []u8 reshapes.
internal/analysis/semantics/typechecker/syntax_types.go Parse Stringable as a predeclared constraint type and refactor unqualified-name handling.
internal/analysis/semantics/resolver/resolver.go Treat Stringable as a predeclared type for resolution.
ferret_libs_dev/std/net/http.fer Replace numeric as str formatting with to_str when writing responses.
ferret_libs_dev/global.fer Add extern generic to_str and update string helper FFI signatures to use references.
dry_test.fer Replace numeric as str with to_str in Point::String.
CURRENT_SEMANTICS.md Document view-cast semantics and the new to_str/Stringable conversion model.

Comment thread CURRENT_SEMANTICS.md Outdated
Comment on lines +568 to +569
- `[]u8 as str` produces a trusted text view with the same `{ptr, len}` backing storage
- explicit copy/view materialization helpers (`str_bytes`, `bytes_str`, `str_chars`) remain available

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[]u8 as str is documented as producing a “trusted text view”, but it’s not explicit whether any UTF-8 validation occurs (and if not, that downstream string operations will assume validity). Consider clarifying this bullet (and/or referencing the allocating helper) to make the safety contract unambiguous.

Suggested change
- `[]u8 as str` produces a trusted text view with the same `{ptr, len}` backing storage
- explicit copy/view materialization helpers (`str_bytes`, `bytes_str`, `str_chars`) remain available
- `[]u8 as str` produces a trusted text view with the same `{ptr, len}` backing storage and does not perform UTF-8 validation; it is only valid when the source bytes are already valid UTF-8, and downstream `str` operations assume that invariant
- explicit copy/view materialization helpers (`str_bytes`, `bytes_str`, `str_chars`) remain available; use `bytes_str` when you want the byte-to-string conversion path to be spelled explicitly rather than relying on `[]u8 as str`

Copilot uses AI. Check for mistakes.
Comment thread internal/ir/mir/lower.go Outdated
Comment on lines +1081 to +1084
Args: []Value{&AddrOfValue{
baseValue: baseValue{Location: arg.Loc(), ExprType: &typeinfo.RefType{Inner: argType}},
Source: lowerAddrSource(c, arg),
}},

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowerBuiltinToStrCallValue lowers to_str([]char) by taking &arg via AddrOfValue{Source: lowerAddrSource(c, arg)}. This fails for non-addressable slice expressions (e.g. to_str(str_chars(&s)) or any call/temporary returning []char), because the LLVM backend lowerAddrOf only supports locals/names/places and will reject AddrOfValue whose Source is a call/composite. Consider either (a) passing the slice value directly to the runtime helper (adjusting ABI to accept byval), or (b) materializing the slice into a temp local when it isn’t addressable, then taking the temp’s address.

Suggested change
Args: []Value{&AddrOfValue{
baseValue: baseValue{Location: arg.Loc(), ExprType: &typeinfo.RefType{Inner: argType}},
Source: lowerAddrSource(c, arg),
}},
Args: []Value{lowerValue(c, arg)},

Copilot uses AI. Check for mistakes.
Comment thread CURRENT_SEMANTICS.md Outdated
- explicit copy/view materialization helpers (`str_bytes`, `bytes_str`, `str_chars`) remain available
- direct allocating casts such as `str as []char` and `[]char as str` are rejected
- numeric formatting and `[]char` encoding use `to_str<T: Stringable>(value)`, not `as str`
- types that provide `String() -> str` can use `value as str`

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics doc says “types that provide String() -> str can use value as str”, but the language’s string-method coercion uses a receiver form (e.g. fn Name::String(self) -> str). Please update the documentation to reflect the actual required signature (include the self receiver) so users don’t implement an incorrect method shape.

Suggested change
- types that provide `String() -> str` can use `value as str`
- types that provide `fn Name::String(self) -> str` can use `value as str`

Copilot uses AI. Check for mistakes.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Comment on lines +2136 to +2141
func (c *checker) isStringableType(typ typeinfo.Type) bool {
return c.isStringType(typ) ||
typeinfo.IsNumeric(c.underlying(typ)) ||
c.isCharSliceType(typ) ||
c.isStringMethodCoercion(&typeinfo.StringType{}, typ)
}

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isStringableType doesn’t unwrap ApproxType ("~T") before checking IsNumeric, so values typed as e.g. ~i32/~u64 won’t satisfy the new Stringable constraint even though Comparable and other type logic treat ApproxType as a transparent wrapper. Consider handling *typeinfo.ApproxType (and delegating to its Inner) so to_str works consistently for approximate numeric types.

Copilot uses AI. Check for mistakes.
Comment thread internal/ir/mir/lower.go
Comment on lines +1048 to +1051
if c != nil {
if family, _, ok := typeinfo.NumericInfo(argType); ok {
targetType := typeinfo.Type(&typeinfo.BuiltinType{Name: "i64"})
linkName := "ferret_global_i64_str"

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowerBuiltinToStrCallValue uses typeinfo.NumericInfo(argType) directly. If argType can be wrapped (e.g. ApproxType) or otherwise not a plain *BuiltinType, this will fail to lower to_str calls and may leave an unresolved extern to_str call in MIR/LLVM. Consider unwrapping to the underlying numeric type before calling NumericInfo (matching the typechecker’s notion of numeric/stringable).

Copilot uses AI. Check for mistakes.

// Converts Stringable values to text when formatting or encoding is required.
// Numeric types, str, []char, and types with String(self) -> str satisfy Stringable.
// Use `as` for zero-copy reshapes.

Copilot AI Apr 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The to_str docs here recommend using as for zero-copy reshapes, but don’t mention that []u8 as str is a trusted view that does not validate UTF-8 (per CURRENT_SEMANTICS). Consider adding an explicit warning in this comment block so callers don’t assume the cast is validating/safe for arbitrary bytes.

Suggested change
// Use `as` for zero-copy reshapes.
// Use `as` for zero-copy reshapes.
// Warning: `[]u8 as str` creates a trusted view and does not validate UTF-8.
// Cast arbitrary bytes only when they are already known to be valid UTF-8.

Copilot uses AI. Check for mistakes.
Unwrap ApproxType when checking Stringable and when lowering builtin
to_str so numeric approximations use underlying integer type. Add unit
tests for typechecker and LLVM lowering. Add warning in global.fer
doc comment about unsafe []u8 as str casts.
@itsfuad itsfuad merged commit eee6f6a into main Apr 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants