Skip to content

Conversation

@zjmarlow
Copy link

@zjmarlow zjmarlow commented Jan 9, 2026

draft pull request with minimal changes to handle eol. also updates is-chunked check to include the case where multiple Transfer Encodings are present (chunked should always be last).

requesting feedback before final changes.

@lizmat
Copy link
Collaborator

lizmat commented Jan 12, 2026

Appears to be failing 2 tests for me on MacOS:

=== t/020-message.rakutest
1..21
ok 1 - new 1/4
ok 2 - new 2/4
ok 3 - new 3/4
ok 4 - new 4/4
ok 5 - push-field 1/2
ok 6 - push-field 2/2
ok 7 - add-content 1/2
ok 8 - add-content 2/2
ok 9 - remove-field 1/1
ok 10 - parse 1/4
ok 11 - parse 2/4
ok 12 - parse 3/4
ok 13 - parse 4/4
not ok 14 - Str 1/2
# Failed test 'Str 1/2'
# at t/020-message.rakutest line 41
# expected: 'a: b, c, d
# 
# line
# '
#      got: 'a: b, c, d
# Content-Length: 4
# 
# line'
not ok 15 - Str 2/2
# Failed test 'Str 2/2'
# at t/020-message.rakutest line 42
# expected: 'a: b, c, d

line
'
#      got: 'a: b, c, d
Content-Length: 4

line'
ok 16 - clear 1/2
ok 17 - clear 2/2
ok 18 - parse complex 1/3
ok 19 - parse complex 2/3
ok 20 - parse complex 3/3
# Subtest: charset
    ok 1 - dumb default charset
    ok 2 - default text charset
    ok 3 - default "non-text" charset
    ok 4 - explicity charset
    1..4
ok 21 - charset
# You failed 2 tests of 21

@zjmarlow
Copy link
Author

Should HTTP::UserAgent be producing valid Messages, or is it up to the users of the library to use it in such a way that it only produces valid messages? Some of the test messages are themselves invalid, so what needs changing will depend on the library's purpose.

@lizmat
Copy link
Collaborator

lizmat commented Jan 16, 2026

Well, that is a good question. What this module really needs, is a maintainer.

Would you be willing to take on that role? You apparently have a need to fix things in it :-)

@zjmarlow
Copy link
Author

Yes, but I'll need guidance and feedback as I maintain it, if that's okay?

Some first questions:

  • should the module or the module users be responsible for making sure messages generated are valid, and
  • what should be done when an invalid message is received:
    • nothing / ignore
    • parse as much as possible and silently ignore the rest
    • warning
    • exception
    • optional error handler callback that takes the message as an argument
    • other?

I'll start looking through the other open issues, too.

@zjmarlow
Copy link
Author

What would the best channel / medium be for discussing this module?

@lizmat
Copy link
Collaborator

lizmat commented Jan 19, 2026

Either #raku on libera.chat or on Discord

@zjmarlow
Copy link
Author

zjmarlow commented Jan 24, 2026

There are enough changes to get things conformant that it would be difficult not to break existing behavior. My plan is to provide alternate classes that can be imported with:

use HTTP::UserAgent 'strict';

if that is acceptable, though I'm open to other options.

@lizmat
Copy link
Collaborator

lizmat commented Jan 24, 2026

That would make it opt-in, and as such backward compatible. 👍

@zjmarlow
Copy link
Author

zjmarlow commented Jan 25, 2026

I wasn't able to get exporting to work how I was hoping, so instead, the strict versions are just named after the originals with "-Strict" attached. They are subclasses of the originals. The new classes are:

  • HTTP::UserAgent-Strict
  • HTTP::Message-Strict
  • HTTP::Request-Strict
  • HTTP::Response-Strict
  • HTTP::Header-Strict
  • HTTP::Header::ETag (HTTP::Header::Field with a "weak" attribute)

They are all made available with

use HTTP::UA-Strict;

so that it is easy to just import the strict classes. Behavior when mixing strict and original classes is "undefined".

The only change to the original should be the is-chunked multi methods in the HTTP::Message class. One checks the message's own headers, the other takes an arbitrary header object.

The following new testcases were added:

  • 011-headers-strict.rakutest
  • 021-message-issue-226.rakutest
  • 042-request-issue-226.rakutest
  • 051-response-issue-226.rakutest

I still need to do some testing with the HTTP::UserAgent-Strict class before turning this draft request into a full pull request, but this should be a good place for review and verifying testcases (the new behavior passed for me locally across Windows, Linux, and Mac).

@zjmarlow
Copy link
Author

zjmarlow commented Jan 26, 2026

need to fix body-less messages. will comment again once fixed.
fix committed.

unit class HTTP::Cookies;

use HTTP::Cookie;
use HTTP::Response:auth<zef:raku-community-modules>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these were qualified like this because there are un-related packages that provide similarly named modules, so I'd be careful about removing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the removal of this will requires an explanation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that one. It is added back now. I wasn't aware of the similarly named modules.

@zjmarlow
Copy link
Author

My apologies. I am now reading through the META6.json and distributions documentation and committed some fixes. Will the version's patch number eventually need to be updated?

@zjmarlow
Copy link
Author

Okay, the META6.json should be fixed. I did not change the version number.

@zjmarlow
Copy link
Author

Should this draft be marked ready for review / become a regular pull request? Are there any other changes that would be good before doing so?

@zjmarlow zjmarlow marked this pull request as ready for review January 29, 2026 10:06
Copy link
Member

@jonathanstowe jonathanstowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only major suggestion I have is really a matter of personal taste: I'm not keen on the -Strict naming. I'm not sure I can completely explain why, but to my eye it would be more regular and consistent if the new classes were named HTTP::UserAgent::Strict, HTTP::Message::Strict and so forth. If nothing else it may make it easier to adapt existing code.

self.get-connection($request, $host, $port)
}

my $https_lock = Lock.new;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it makes any difference in this case but having this as a my scoped variable in the class body means that every instance in the same process will get the same Lock object. A program might want multiple UA instances with different parameters for some reason.

It could be an attribute that is populated in TWEAK or even with a custom accessor to populate it lazily I suppose.

multi method get-connection(HTTP::Request-Strict $request, Str $host, Int $port? --> Connection-Strict:D) {
my $conn;
if $request.scheme eq 'https' {
$https_lock.lock;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking it doesn't matter here (the try require ::("IO::Socket::SSL") can't throw an exception,) but generally :

$https_lock.protect: {
    try require ::("IO::Socket::SSL");
};

might be preferred: if nothing else it is less typing and another developer doesn't have to scan the code to find the matching unlock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point of locking here period

@zjmarlow
Copy link
Author

zjmarlow commented Feb 3, 2026

What is the process for completing the pull request / merging?

@jonathanstowe
Copy link
Member

What is the process for completing the pull request / merging?

I'm not sure there is a strict process, but I'd like one or two more reviews before pushing the button 🤣

@zjmarlow
Copy link
Author

zjmarlow commented Feb 3, 2026

Fair enough. I wasn't sure how it worked. =)

@jonathanstowe jonathanstowe requested review from lizmat and ugexe February 3, 2026 18:18
@jonathanstowe
Copy link
Member

I've added @lizmat and @ugexe as they were the two most recent people to do a release of this.

@lizmat lizmat removed their request for review February 3, 2026 18:47
@lizmat
Copy link
Collaborator

lizmat commented Feb 3, 2026

I only prepared this module for an initial community modules release. I have currently no affinity with this module at all, so removed myself as a reviewer.

}

sub _clear-url(Str $url is copy) {
our sub _clear-url(Str $url is copy) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these subs changed to our scope?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that they could be used in the Strict classes. I can copy the subs over to the new classes, if that is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that I'd duplicate code, so I'm not suggesting an alternative. But here you are essentially making this function part of the public API, i.e. users can now call HTTP::UserAgent::_clear-url(...). You could mark is is-implementation-detail but even that feels like a code smell.

@ugexe
Copy link
Member

ugexe commented Feb 3, 2026

I haven't really looked closely at the code, but my initial thought a strict mode would be implemented in one of two ways:

  1. A flag on the various existing objects that signal calling something like a .validate() function

  2. Allow passing in the HTTP:: classes (that fulfill some role) used by HTTP::UserAgent similar to what Net::HTTP allows

Option 2 also allows for those e.g. HTTP::Header::Strict classes to be part of their own distribution. Does the way this is implement have advantages over doing it one of those ways? Again I haven't looked closely at the code so I'm genuinely asking.

I also agree with the other comment that the naming seems a bit weird. Specifically I think the existence of a duplicate UserAgent class at all is what feels off.

@zjmarlow
Copy link
Author

zjmarlow commented Feb 3, 2026

I don't think this approach has an advantage over option 1, so I will look back over the changes in the new classes and see about having different methods used based on a flag provided in the original classes. The flag would need to be provided for the UserAgent, as well as the Message classes. I am not familiar with Net::HTTP but will take a look at that approach.

The purpose of these changes is to address invalid output, as opposed to checking for invalid input. The original code was appending an extra CRLF to requests, which some browser driver implementations (implementing the WebDriver protocol) do not accept, specifically for POST requests.

@zjmarlow
Copy link
Author

zjmarlow commented Feb 5, 2026

I am working through option 1. The general approach is to have a :$strict parameter in the constructor of each of the affected classes. In addition, affected methods also accept a strict flag. Then strict instances will pass strict to method calls with the final strict setting for methods being $strict || $!strict. It also means methods can be invoked with strict individually (not sure if that's a use case).

@zjmarlow
Copy link
Author

zjmarlow commented Feb 6, 2026

Flag implementation committed. Since the issue to address was mainly output, strict expects valid input, but does not validate it. Meaning the #226 case would probably want a non-strict UserAgent with a strict Request. That combination has not been tested yet and I am working on testcases to cover the various mixed cases. The latest commit also removes the ::Strict classes and reverts the our subs back to their original.

@patrickbkr
Copy link
Member

@zjmarlow Can you give me a bit more context? I can't recall what you're referring to / in what way I've been involved in this issue.

@zjmarlow
Copy link
Author

zjmarlow commented Feb 6, 2026

Sorry, I was too quick with the auto-complete. It was meant to be in reference to #226 , which was opened by bbkr.

@zjmarlow
Copy link
Author

zjmarlow commented Feb 7, 2026

Testcases added for the various UserAgent and Request strict / non-strict interactions.

Copy link
Member

@ugexe ugexe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite a lot of changes, nice! Unfortunately making lots of changes also has drawbacks... would it be possible to clean up this PR further? There are multiple commits fixing things broken in/from previous commits which makes reviewing things commit by commit frustrating. The PR title is still "preliminary changes" and is described as being a draft pull request (among other things), which makes understanding the git history more difficult. There are also many leftover comments in the code itself.

I left quite a few comments, but that was just from a cursory review so I suspect there are things I missed. Hopefully it is not too discouraging... you've bitten off a rather large piece of work.


method new($content?, *%fields) {
my $header = HTTP::Header.new(|%fields);
method new($content?, Bool :$strict, *%fields) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bit of an unfortunate api design. What if I want to send a header named strict?

# pop zero-length Str that occurs after last chunk
# what to do if this doesn't happen?
@lines.pop if @lines %2;
@lines = grep *,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of grepping on *? It is also hard to read this chain since it is mixing routine and method calls.

}

method parse($raw_message) {
method !parse ( $raw_message ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!parse isn't a great name for this function, especially given there is a public function of the same name. !parse-strict or some such would make it more clear what the purpose of this function is.

Also it seems like there is a lot of logic that could be shared between !parse and parse. Refactoring these would make it easier to maintain.

# technically incorrect - content allowed to contain embedded CRLFs
my @lines = $content.split: $CRLF;
# pop zero-length Str that occurs after last chunk
# what to do if this doesn't happen?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, what should this do?

Copy link
Author

@zjmarlow zjmarlow Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option would be: if the user specified throw-exceptions, to throw with 'truncated last chunk'. The changes are meant to focus on producing strict output rather than demand strict input, so I'm not sure about this - your thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a strange design to only have special http header objects for one type of header. Is there any reason this needs its own object? The motivation for it isn't clear to me as the commit message on this file is just "initial strict implementation".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ETags can be tagged as weak validators and might be handled differently in those cases.

# headers container
has @.fields;

grammar Grammar::Strict {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other grammar in this file is called HTTP::Header::Grammar which makes it clear what it is used for. To me a name like Grammar::Strict in the same file does not suggest it has any relation to HTTP::Header::Grammar even though it appears that is the case based on the parse method that returns one of these classes.

$.file = '/' ~ $.file unless $.file.starts-with: '/';
my $s = "$.method $.file $.protocol";
$s ~= $CRLF ~ callwith($CRLF, :$debug, :$bin);
join $CRLF, $s, callwith $CRLF, :$debug, :$bin, :$strict;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit pick, but it would be easier to review the important changes if stylistic changes were left out.

self.print($request.Str(:bin));
self.write($request.content);
}
elsif $strict or $request.strict {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect any strict checks to apply regardless of if the request is binary or not


multi method post(Str $uri is copy, %form, Bool :$bin, *%header ) {
self.post(URI.new(_clear-url($uri)), %form, |%header)
# should :$bin also be passed along?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, should it?

@zjmarlow
Copy link
Author

zjmarlow commented Feb 8, 2026

Hello @ugexe , thank you for taking the time to do this. Would you prefer a separate PR with only the final changes committed for any further reviews? I have some other work to take care of so it will likely be over the next few days that I get through the change requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants