HƯỚNG DẪN XMLParser versus W3C DOM XML

Joe

Thành viên VIP
21/1/13
2,969
1,311
113
Hi

Normally I am very reluctant to do any comparison between different products. In particular with something to do with me. However, since Admin quydtkt starts a series about XML and JSON -two similar Data Description languages (DDL) there are some interests on the two DDL. And then I found on the web a lot of "blogs" of some "experts" about XML and JSON. They compared and made a lot of noise so that I had to laugh and then I start to write this blog. They all stated that "JSON is less verbose and faster". And that is! And they declined to tell the readers "How faster and how they've measured the fastness of JSON and XML". Weird and shoddy for such technical publications (on the web). In my opinion such publications are trash. Maybe I had to write my own JSON parser so that I could contradict their funny comparisons.

If I wanted to compare something versus other thing I had to be versed in both things. And then I could start to do the measurement before I started to publicize my "conclusion" about "this" versus "that". To do the comparison between my XMLParser and the W3C-DOM I've downloaded the W3C DOM package (click HERE) and picked one of the DOM/XML examples on the WEB (click HERE) for my comparison and simplified it so that the readers could better follow the algorithm. To calibrate the latency time (the Class loading) I run the XML parser two times before I start to let the clock run. The codes:

The XML document: employees.xml
PHP:
<?xml version="1.0" encoding="UTF-8"?>
<Employees>
     <Employee ID="1">
          <Firstname>Lebron</Firstname >
          <Lastname>James</Lastname>
          <Age>30</Age>
          <Salary>2500</Salary>
     </Employee>
     <Employee ID="2">
          <Firstname>Anthony</Firstname>
          <Lastname>Davis</Lastname>
          <Age>22</Age>
          <Salary>1500</Salary>
     </Employee>
     <Employee ID="3">
          <Firstname>Paul</Firstname>
          <Lastname>George</Lastname>
          <Age>24</Age>
          <Salary>2000</Salary>
     </Employee>
     <Employee ID="4">
          <Firstname>Blake</Firstname>
          <Lastname>Griffin</Lastname>
          <Age>25</Age>
          <Salary>2250</Salary>
     </Employee>
</Employees>
With W3C-DOM/javax.xml.parsers.* : Test_W3C.java
PHP:
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
// source: https://examples.javacodegeeks.com/java-xml-parser-tutorial/
// modified by Joe just for the comparison purpose
public class Test_W3C {

  public static void main(String[] args) throws Exception {
    long beg = System.nanoTime();
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    Document document = builder.parse(new File("employees.xml"));
    System.out.println("1st.time Parsing:"+((double)(System.nanoTime()-beg)/1000)+" microSec.");
    beg = System.nanoTime();
    builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    document = builder.parse(new File("employees.xml"));
    System.out.println("2nd.time Parsing:"+((double)(System.nanoTime()-beg)/1000)+" microSec.");

    List<String> list;
    beg = System.nanoTime();
    List<List<String>> nodes = new ArrayList<>();
    NodeList nodeList = document.getDocumentElement().getChildNodes();
    for (int i = 0, mx = nodeList.getLength(); i < mx; ++i) {
      Node node = nodeList.item(i);

      if (node.getNodeType() == Node.ELEMENT_NODE) {
        Element elem = (Element) node;
        list = new ArrayList< >();
        list.add(elem.getElementsByTagName("Firstname")
                .item(0).getChildNodes().item(0).getNodeValue());
        list.add(elem.getElementsByTagName("Lastname").item(0)
                .getChildNodes().item(0).getNodeValue());
        list.add(elem.getElementsByTagName("Age")
                .item(0).getChildNodes().item(0).getNodeValue());
        list.add(elem.getElementsByTagName("Salary")
                .item(0).getChildNodes().item(0).getNodeValue());
        nodes.add(list);
      }
    }
    System.out.println("All Nodes:"+((double)(System.nanoTime()-beg)/1000)+" microSec.\n"+
                       nodes);
  }
with Joe's API XMLParser: Test.java
PHP:
import java.util.*;
// Joe Nartca (C)
public class Test {
  public static void main(String argv[]) throws Exception {
    long beg = System.nanoTime();
    XMLParser xml = new XMLParser("employees.xml");
    System.out.println("1st.time Parsing:"+((double)(System.nanoTime()-beg)/1000)+" microSec.");
    beg = System.nanoTime();
    xml = new XMLParser("employees.xml");
    System.out.println("2nd.time Parsing:"+((double)(System.nanoTime()-beg)/1000)+" microSec.");
    List<String> list;
    beg = System.nanoTime();
    List<String> emp = xml.getNodes("Employees");
    List<List<String>> nodes = new ArrayList<>();
    for (String E:emp) {
      list = new ArrayList<>();
      List<String[]> node = xml.getNode(E);
      for (String[] a:node) list.add(a[1]);
      nodes.add(list);
    }
    System.out.println("All Nodes:"+((double)(System.nanoTime()-beg)/1000)+" microSec.\n"+
                       nodes);
  }
}
And the results:
Code:
C:\JoeApp\DOM>javac  -d ./classes Test.java

C:\JoeApp\DOM>javac  -d ./classes Test_W3C.java

C:\JoeApp\DOM>java Test
1st.time Parsing:128498.7 microSec.
2nd.time Parsing:2559.6 microSec.
All Nodes:414.9 microSec.
[[Lebron, James, 30, 2500], [Anthony, Davis, 22, 1500], [Paul, George, 24, 2000], [Blake, Griffin, 25, 2250]]

C:\JoeApp\DOM>java Test_W3C
1st.time Parsing:156799.1 microSec.
2nd.time Parsing:4316.6 microSec.
All Nodes:2067.8 microSec.
[[Lebron, James, 30, 2500], [Anthony, Davis, 22, 1500], [Paul, George, 24, 2000], [Blake, Griffin, 25, 2250]]

C:\JoeApp\DOM>java Test_W3C
1st.time Parsing:137627.6 microSec.
2nd.time Parsing:4592.5 microSec.
All Nodes:2445.6 microSec.
[[Lebron, James, 30, 2500], [Anthony, Davis, 22, 1500], [Paul, George, 24, 2000], [Blake, Griffin, 25, 2250]]

C:\JoeApp\DOM>java Test
1st.time Parsing:141007.9 microSec.
2nd.time Parsing:1876.6 microSec.
All Nodes:384.7 microSec.
[[Lebron, James, 30, 2500], [Anthony, Davis, 22, 1500], [Paul, George, 24, 2000], [Blake, Griffin, 25, 2250]]

C:\JoeApp\DOM>
The measured time of Test (Joe) is faster than the "official" standard W3C-DOM/javax.xml.parsers. The 1st parsing time always includes the time for the classes loading. For the 2nd parsing the classes are already in memory so that the parsing times of both are shorter:
  • Test.java: 2559.6 microSec. and 1876.6 microSec. All Nodes: 414.9 microSec. and 384.7 microSec.
  • Test_W3C.java: 4316.6 microSec. and 4592.5 microSec. All Nodes: 2067.8 microSec. and 2445.6 microSec.
I wonder if a "JSON parser" could beat the parsing time of my XMLParser...BTW, the source XMLParser can be downloaded from HERE.
 
Sửa lần cuối:

Joe

Thành viên VIP
21/1/13
2,969
1,311
113
(cont.)
And tthe more similar way to Test_W3C with XMLParser: XTest.java
PHP:
import java.util.*;
// Joe Nartca (C)
public class XTest {
  public static void main(String argv[]) throws Exception {
    long beg = System.nanoTime();
    XMLParser xml = new XMLParser("employees.xml");
    System.out.println("1st.time Parsing:"+((double)(System.nanoTime()-beg)/1000)+" microSec.");
    beg = System.nanoTime();
    xml = new XMLParser("employees.xml");
    System.out.println("2nd.time Parsing:"+((double)(System.nanoTime()-beg)/1000)+" microSec.");
    List<String> list;
    beg = System.nanoTime();
    List<String> emp = xml.getNodes("Employees");
    List<List<String>> nodes = new ArrayList<>();
    for (String E:emp) {
      list = new ArrayList<>();
      list.add(xml.getValue(E, "Firstname"));
      list.add(xml.getValue(E, "Lastname"));
      list.add(xml.getValue(E, "Age"));
      list.add(xml.getValue(E, "Salary"));
      nodes.add(list);
    }
    System.out.println("All Nodes:"+((double)(System.nanoTime()-beg)/1000)+" microSec.\n"+
                       nodes);
  }
}
and the result is still better than Test_W3C
Code:
C:\JoeApp\DOM>java XTest
1st.time Parsing:115047.4 microSec.
2nd.time Parsing:1470.3 microSec.
All Nodes:162.5 microSec.
[[Lebron, James, 30, 2500], [Anthony, Davis, 22, 1500], [Paul, George, 24, 2000], [Blake, Griffin, 25, 2250]]

C:\JoeApp\DOM>java Test
1st.time Parsing:129151.3 microSec.
2nd.time Parsing:2242.3 microSec.
All Nodes:227.7 microSec.
[[Lebron, James, 30, 2500], [Anthony, Davis, 22, 1500], [Paul, George, 24, 2000], [Blake, Griffin, 25, 2250]]

C:\JoeApp\DOM>java Test_W3C
1st.time Parsing:150127.3 microSec.
2nd.time Parsing:5335.2 microSec.
All Nodes:2069.6 microSec.
[[Lebron, James, 30, 2500], [Anthony, Davis, 22, 1500], [Paul, George, 24, 2000], [Blake, Griffin, 25, 2250]]
Conclusion: it depends on how you compare the objects. To achieve an objective view you must have a testing algorithm which is repeatable at any time and anywhere. A blanket statement "JSON is less verbose and faster" is nontechnical, unscientific and unacademic.

Further DDL is a language like OOPL (e.g. JAVA or C#) and NOT an implementation. If a DDL is
badly implemented it does NOT mean it is worse than any other DDL. My implemented XMLParser proves it. A comparison of JSON and XML is therefore a NONSENSE when the authors don't even know how to implement a Parser to the comparing DDLs.


The included
cdj.zip contains:

  1. XMLParser.java (last version)
  2. Test.java
  3. Xtest.java
  4. Test_W3C.java
  5. employees.xml
  6. w3c-dom.jar
 

Attachments

Sửa lần cuối:

Joe

Thành viên VIP
21/1/13
2,969
1,311
113
An ancillary note:
As I said in this thread "XMLParser API for all CongdongJava's members" the Parser is NOT a replacement for a full-fledged XML-DOM parser. If there is enough interest on this XMLParser I will enhance it (with parallelism) and post the new version HERE. You can express your interest in my CDJ's mailbox.

Joe