Skip to content

TimeMachine: xml2sql Java.lang.NullPointerException  #225

@FelixDrinkall

Description

@FelixDrinkall

I am getting the following error:

INFO [main] (Log4jLogger.java:28) - Pagelinks 1527700000
 INFO [main] (Log4jLogger.java:28) - Processing the text table
Exception in thread "xml2sql" java.lang.NullPointerException        at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.sql.SQLEscape.escape(SQLEscape.java:37)        at de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.TextWriter.writeRevision(TextWriter.java:55)        at de.tudarmstadt.ukp.wikipedia.mwdumper.importer.PageFilter.writeRevision(PageFilter.java:67)        at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.closeRevision(AbstractXmlDumpReader.java:548)        at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.endElement(AbstractXmlDumpReader.java:338)        at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:610)        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1718)        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2883)        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)        at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)        at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)        at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)        at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)        at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)        at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)        at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)        at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197)        at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.readDump(AbstractXmlDumpReader.java:205)        at de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.XMLDumpTableInputStreamThread.run(XMLDumpTableInputStreamThread.java:90)
 INFO [main] (Log4jLogger.java:28) - Write end dead

java.base/java.io.PipedInputStream.read(PipedInputStream.java:310)
java.base/java.io.PipedInputStream.read(PipedInputStream.java:377)
java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:271)
de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.XMLDumpTableInputStream.read(XMLDumpTableInputStream.java:83)
java.base/java.io.DataInputStream.readInt(DataInputStream.java:392)
de.tudarmstadt.ukp.wikipedia.wikimachine.util.UTFDataInputStream.readUTFAsArray(UTFDataInputStream.java:73)
de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.TextParser.next(TextParser.java:69)
de.tudarmstadt.ukp.wikipedia.wikimachine.domain.DumpVersionProcessor.processText(DumpVersionProcessor.java:153)
de.tudarmstadt.ukp.wikipedia.timemachine.domain.TimeMachineGenerator.processInputDumps(TimeMachineGenerator.java:133)
de.tudarmstadt.ukp.wikipedia.timemachine.domain.TimeMachineGenerator.start(TimeMachineGenerator.java:109)
de.tudarmstadt.ukp.wikipedia.timemachine.domain.JWPLTimeMachine.main(JWPLTimeMachine.java:83)

The command is:
java -Djdk.xml.totalEntitySizeLimit=2147483647 -Xmx512m -cp ".:./log4j.properties:./*" de.tudarmstadt.ukp.wikipedia.timemachine.domain.JWPLTimeMachine config.xml
The config.xml file is:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
  <comment>This a configuration for the JWPL TimeMachine</comment>
  <entry key="language">english</entry>
  <entry key="mainCategory">Contents</entry>
  <entry key="disambiguationCategory">Disambiguation_pages</entry>
  <entry key="fromTimestamp">20060101000000</entry>
  <entry key="toTimestamp">20060102000000</entry>
  <entry key="each">1</entry>
  <entry key="metaHistoryFile">/nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-pages-meta-history1.xml-p1p844.bz2</entry>
  <entry key="categoryLinksFile">/nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-categorylinks.sql.gz</entry>
  <entry key="pageLinksFile">/nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-pagelinks.sql.gz</entry>
  <entry key="outputDirectory">/nethome/felixd/wiki_timemachine/wiki_formatted</entry>
  <entry key="removeInputFilesAfterProcessing">false</entry>
</properties>

The command seems to be returning the first wiki entry and then fails. In my output directory, I have a PageMapLine.txt with the following entry:
11286 Fruitarianism 11286 NULL NULL

And I have a Page.txt file:
11286 11286 Fruitarianism [[Image:Fruit.jpg|frame|right|A selection...............

What is going on? Should I edit the SQLEscape file in the jar?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions