Currently, the Xeno.DOM.Robust does not properly handle XML doctypes.
Doctypes are removed if they appear at the start of the document, however, usually the doctypes are placed after the XML-declaration: <?xml ...><!DOCTYPE html>.
i.e., this test fails:
describe "skipDoctype" $ do
it "strips doctype after xml declaration" $ do
skipDoctype "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html>Hello" `shouldBe` "<?xml version=\"1.0\" encoding=\"UTF-8\"?>Hello"
One thought is that skipDoctype should check the first two < if they are followed !DOCTYPE and then remove the matching node.
I don't think supporting a doctype at the end of a document is something to be bothered with.
On top of that, skipDoctype does also not handle doctypes with internal subsets such as
<!DOCTYPE html [
<!-- an internal subset can be embedded here -->
]>
Appropriate test:
describe "skipDoctype" $ do
it "strips doctype with internal subsets" $ do
skipDoctype "<!DOCTYPE html [ <!-- --> ]><?xml version=\"1.0\" encoding=\"UTF-8\"?>Hello" `shouldBe` "<?xml version=\"1.0\" encoding=\"UTF-8\"?>Hello"
In this case, skipDoctype will return a ByteString which starts with ]>.
Ideally, skipDoctype should drop until [ or >, and if a [ was matched, then continue to drop until ]>.
Currently, the Xeno.DOM.Robust does not properly handle XML doctypes.
Doctypes are removed if they appear at the start of the document, however, usually the doctypes are placed after the XML-declaration:
<?xml ...><!DOCTYPE html>.i.e., this test fails:
One thought is that skipDoctype should check the first two
<if they are followed!DOCTYPEand then remove the matching node.I don't think supporting a doctype at the end of a document is something to be bothered with.
On top of that, skipDoctype does also not handle doctypes with internal subsets such as
Appropriate test:
In this case, skipDoctype will return a ByteString which starts with
]>.Ideally, skipDoctype should drop until
[or>, and if a[was matched, then continue to drop until]>.