-
Notifications
You must be signed in to change notification settings - Fork 322
Description
I have a client I can't change, which submits a malformed SOAP doc. The client assumes that there's already a namespace called soap with the uri http://schemas.xmlsoap.org/soap/. Who knows why the original server accepted this. I'm not in a position to judge or fix, only make something bug for bug compatible.
Let's demonstrate the behavior in a simplified way.
from lxml import etree
s = "<soapenv:Header><soap:authentication><soap:username>some_user</soap:username><soap:password>some_pass</soap:password></soap:authentication></soapenv:Header>"
etree.fromstring(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "src/lxml/etree.pyx", line 3237, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1896, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1777, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1082, in lxml.etree._BaseParser._parseUnicodeDoc
File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
File "<string>", line 1
etree.XMLSyntaxError: Namespace prefix soapenv on Header is not defined, line 1, column 16
I had originally gone down the path of trying to use the listeners, like before_deserialize, but all of those seem to occur after lxml.XMLID tries to parse the object and throws a fault (
spyne/spyne/protocol/soap/soap11.py
Line 112 in 3547234
| root, xmlids = etree.XMLID(string.encode(charset), parser) |
I ended up monkey patching Soap11's _parse_xml_string to do some nasty string manipulation with some regex's to find out if the namespace was specified in the envelope, and if not, modify the incoming string to include it.
- Will this fail on super large payloads? Probably (my payloads are less than 4k, it's not going to be an issue)
- Do I think this was the right thing to do? Not really, but the POC works.
It makes sense that an invalid XML doc would cause a failure, but I'm wondering if there are other strategies I should be considering instead of this approach. Or perhaps it was the intent of those listeners in the Soap11 object to help cope with bad document and the functionality was lost over time. If it's the former, I'm all ears, if it's the later, I'm not totally sure how I would fix it.
I could see a path where create_in_document fires an event that allows you to manipulate the xml string before ctx.in_document gets a parsed XML object, and I would take a swing a PR for that functionality if anyone sees merit in it.
Thoughts?