I am getting the following error:
INFO [main] (Log4jLogger.java:28) - Pagelinks 1527700000
INFO [main] (Log4jLogger.java:28) - Processing the text table
Exception in thread "xml2sql" java.lang.NullPointerException at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.sql.SQLEscape.escape(SQLEscape.java:37) at de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.TextWriter.writeRevision(TextWriter.java:55) at de.tudarmstadt.ukp.wikipedia.mwdumper.importer.PageFilter.writeRevision(PageFilter.java:67) at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.closeRevision(AbstractXmlDumpReader.java:548) at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.endElement(AbstractXmlDumpReader.java:338) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:610) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1718) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2883) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605) at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888) at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824) at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324) at java.xml/javax.xml.parsers.SAXParser.parse(SAXParser.java:197) at de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.readDump(AbstractXmlDumpReader.java:205) at de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.XMLDumpTableInputStreamThread.run(XMLDumpTableInputStreamThread.java:90)
INFO [main] (Log4jLogger.java:28) - Write end dead
java.base/java.io.PipedInputStream.read(PipedInputStream.java:310)
java.base/java.io.PipedInputStream.read(PipedInputStream.java:377)
java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:271)
de.tudarmstadt.ukp.wikipedia.timemachine.dump.xml.XMLDumpTableInputStream.read(XMLDumpTableInputStream.java:83)
java.base/java.io.DataInputStream.readInt(DataInputStream.java:392)
de.tudarmstadt.ukp.wikipedia.wikimachine.util.UTFDataInputStream.readUTFAsArray(UTFDataInputStream.java:73)
de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.TextParser.next(TextParser.java:69)
de.tudarmstadt.ukp.wikipedia.wikimachine.domain.DumpVersionProcessor.processText(DumpVersionProcessor.java:153)
de.tudarmstadt.ukp.wikipedia.timemachine.domain.TimeMachineGenerator.processInputDumps(TimeMachineGenerator.java:133)
de.tudarmstadt.ukp.wikipedia.timemachine.domain.TimeMachineGenerator.start(TimeMachineGenerator.java:109)
de.tudarmstadt.ukp.wikipedia.timemachine.domain.JWPLTimeMachine.main(JWPLTimeMachine.java:83)
The command is:
java -Djdk.xml.totalEntitySizeLimit=2147483647 -Xmx512m -cp ".:./log4j.properties:./*" de.tudarmstadt.ukp.wikipedia.timemachine.domain.JWPLTimeMachine config.xml
The config.xml file is:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>This a configuration for the JWPL TimeMachine</comment>
<entry key="language">english</entry>
<entry key="mainCategory">Contents</entry>
<entry key="disambiguationCategory">Disambiguation_pages</entry>
<entry key="fromTimestamp">20060101000000</entry>
<entry key="toTimestamp">20060102000000</entry>
<entry key="each">1</entry>
<entry key="metaHistoryFile">/nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-pages-meta-history1.xml-p1p844.bz2</entry>
<entry key="categoryLinksFile">/nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-categorylinks.sql.gz</entry>
<entry key="pageLinksFile">/nethome/felixd/wiki_timemachine/wiki_raw/enwiki-latest-pagelinks.sql.gz</entry>
<entry key="outputDirectory">/nethome/felixd/wiki_timemachine/wiki_formatted</entry>
<entry key="removeInputFilesAfterProcessing">false</entry>
</properties>
The command seems to be returning the first wiki entry and then fails. In my output directory, I have a PageMapLine.txt with the following entry:
11286 Fruitarianism 11286 NULL NULL
And I have a Page.txt file:
11286 11286 Fruitarianism [[Image:Fruit.jpg|frame|right|A selection...............
What is going on? Should I edit the SQLEscape file in the jar?
I am getting the following error:
The command is:
java -Djdk.xml.totalEntitySizeLimit=2147483647 -Xmx512m -cp ".:./log4j.properties:./*" de.tudarmstadt.ukp.wikipedia.timemachine.domain.JWPLTimeMachine config.xmlThe config.xml file is:
The command seems to be returning the first wiki entry and then fails. In my output directory, I have a PageMapLine.txt with the following entry:
11286 Fruitarianism 11286 NULL NULL
And I have a Page.txt file:
11286 11286 Fruitarianism [[Image:Fruit.jpg|frame|right|A selection...............
What is going on? Should I edit the SQLEscape file in the jar?