Doctype after <?xml>  or with internal subsets results in parse errors

Currently, the Xeno.DOM.Robust does not properly handle XML doctypes.

Doctypes are removed if they appear at the start of the document, however, usually the doctypes are placed after the XML-declaration: `<?xml ...><!DOCTYPE html>`.

i.e., this test fails:
```hs
describe "skipDoctype" $ do
  it "strips doctype after xml declaration" $ do
    skipDoctype "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html>Hello" `shouldBe` "<?xml version=\"1.0\" encoding=\"UTF-8\"?>Hello"
```
One thought is that skipDoctype should check the first two `<` if they are followed `!DOCTYPE` and then remove the matching node.
I don't think supporting a doctype at the end of a document is something to be bothered with.


On top of that, skipDoctype does also not handle doctypes with internal subsets such as
```xml
<!DOCTYPE html [
  
]>
``` 
Appropriate test:
```hs
describe "skipDoctype" $ do
  it "strips doctype with internal subsets" $ do
    skipDoctype "<!DOCTYPE html [  ]><?xml version=\"1.0\" encoding=\"UTF-8\"?>Hello" `shouldBe` "<?xml version=\"1.0\" encoding=\"UTF-8\"?>Hello"
```
In this case, skipDoctype will return a ByteString which starts with `]>`.
Ideally, skipDoctype should drop until `[` or `>`, and if a `[` was matched, then continue to drop until `]>`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doctype after <?xml> or with internal subsets results in parse errors #50

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Doctype after <?xml> or with internal subsets results in parse errors #50

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions