HƯỚNG DẪN Distributed Object Database -DODB-

Joe

Thành viên VIP
21/1/13
3,075
1,342
113
Hi

When you see this title you probably would sigh "Oh no! Everyone knows that OODB is a crutch of DB design". Well, if you are an application developer who thrives and withers with any prescribed SQL database then you should skip reading this blog. But I was surprised to see the blog Google IO & AlloyDB posted by hicder, the author and a member of Kipalog.com. It's about the "problems" of distributed RDB and DRDB (Distributed Relative Database). So I discuss hereunder with the out-and-out developers who rather develop their own tools than scavenge the web for the usable or the recyclable.

SQL and Relative DB is old. Very old. Probably older than its developers. Object Oriented Programming is very old too. But OOP is at least younger and more versatile than SQL -the self-claimed 4th Programming Language or 4GL for short. SQL is in plain and easy-to-understand English. Everyone could learn it within some hours. For example: I want to list all my clients Table who live in the city Saigon:
Code:
SELECT * FROM clients WHERE city='Saigon'
(more about standard SQL: [Click HERE]).

The blog of hicder highlights the hidden problems that every SQL-RDB developer confronts when the data become vast and have to spread onto different computers (or nodes). He wrote:
If developers want to split their data in multiple nodes, they need to handle it in the application layers. Meaning developers will need to be aware that their data lives in multiple "tables": "user_1", "user_2", ..., "users_10". If they want to count number of "users", they would have to construct 10 SQL queries, get the results and sum them up.

To be more specific, if you want:
Code:
SELECT COUNT(*) FROM user;
[code]
you need to manually build and execute all these queries:
[code=java]
SELECT COUNT(*) FROM user_1;
SELECT COUNT(*) FROM user_2;
...
SELECT COUNT(*) FROM user_10;
So the problem is the number of numerous little tables that spread around among the nodes. A general problem of Distributed Relational Database (or DRDB for short). Such a fixed list of SQL-query is never a good idea because any expansion or reduction will lead to a redesign of the list. A nightmare for those who "inherit" the application and have to maintain it. But beside the different SQL dialects the main obstacle of RDB is the tabularization of the data. In the age of OOP it's the worst scenario to deface an object before it is stored because every element in a table loses its relationship and its behavior to all other elements. The worst is the lost of its properties. For example: the data of object BMW and Audi have the similar elements, but different in relationship, behavior and properties to each object as a BMW or as an Audi.

Object is an entity in Object-Oriented programming. Meaning that the data belong as a whole to an object. Accessing of an object is an access by its name and it always gives the complete data of the object. There is no way to access an element of an object separately without involving the object. From this viewpoint there is no need to create a table (like in RDB). However Object-Oriented Programming Languages (OOPL) are so different and so incompatible that it is impossible to create a generic object for a generic ODB like with RDB. Storing of Object to a file in OOPL is known under the term Serialization. The serialization format is OOPL dependent. Hence an Object-Oriented DB is therefore OOPL-dependent.

With this fact in mind ODB can be only designed and developed in a chosen OOPL as the base and with different interfaces to bridge to other OOPL (like JAVA-SQL interfaces to RDB). JAVA, for example, allows developers to design objects from the simplest (POJO or Plain Old Java Object) to the most complex object with full-fleged private data and methods. As long as JAVA object with its own sub-objects is serializable this object can be stored on any external medium. The way how to store the serialized objects optimally and how to retrieve them quickly constitute the foundation for an Object Database.

Instead of rummaging the Web for an adequate ODB (what you never find -I am sure with that) you as an OOPL developer can implement yourself your own ODB. I show you how to implement an ODB in JAVA and for JAVA applications. An ODB always has two parts:
  1. APIs for the client applications
  2. Server that provides the access and processes the data
The implementation on the client site is relatively simple. Because of the Client-Server architecture the Client-ODB API bases on a set of ODB accessing methods and a network connection to the ODB server which is usually a socket or SocketChannel (Hostname and Port) so that the location of the server can be either local (LAN) or remote (Internet/Web or WAN). The ODB accessing methods are the standardized functions such as OPEN, CLOSE, LOCK, UNLOCK, READ, ADD (or WRITE), DELETE, ROLLBACK (or UNDELETE) and some status queries such as isLocked, isExisted, etc. Excerpt of such a Client API:
Java:
public class ODBConnect {
  /**
  constructor
  @param dbHost String, OODB Server hostname or IP
  @param port int, OODB server port
  @param pw String, User's password
  @param uID String, User's ID
  @exception Exception thrown by JAVA
  */
  // reserved command: x1F or 31 for GZIP if OO data size > 256
  public ODBConnect(String dbHost, int port, String pw, String uID) throws Exception {
    soc = SocketChannel.open(new InetSocketAddress(dbHost, port));
    soc.socket().setReceiveBufferSize(32768); // 32KB
    soc.socket().setSendBufferSize(32768);
    ...
  }
  /**
  disconnect() the connection to OODB Server and closes all opened DBs
  */
  public void disconnect( ) {
    ...
  }
  /**
  add() object.
  @param dbName String, the DB name.
  @param key String, key name
  @param obj byte array to be added
  @exception Exception thrown by JAVA
  */
  public void add(String dbName, String key, byte[] obj) throws Exception {
    ...
  }

   /**
  update() object.
  @param dbName String, the DB name.
  @param key String, key name
  @param obj byte array to be replaced
  @exception Exception thrown by JAVA
  */
  public void update(String dbName, String key, byte[] obj) throws Exception {
    ...
  }
  /**
  delete() object.
  @param dbName String, the DB name.
  @param key String, key name
  @exception Exception thrown by JAVA
  */
  public void delete(String dbName, String key) throws Exception {
    ...
  }
  ...
}
The main problem of DRDB or DODB (Distributed ODB) is when one of the nodes crashes or downs the client apps stay in the dark or get crashed unexpectedly. Remedy is an Instant Messaging System (IMS) that broadcasts such unexpected events to all participated client apps. One of the most convenient way to implement an IMS is to develop a system that bases on [UDP-MulticastSocket] with an [Multicasting IP] and this IMS should be integrated as the listener (or Subscriber) into the Client API and as the message broadcaster (or Publisher & Subscriber) into the ODB Server as well. With IMS different instant messages can be sent by any server node that starts up or shutdowns so that all listening clients could react accordingly to any new event. Click [HERE] for more about PubSub-IMS.

The Interface
Java:
public interface ODBEventListening {
  public void odbEvent(ODBEventListening e);
}
The ODBEventListening
Java:
public class ODBEvent {
  /**
  Constructor
  @param msg String the EventMessage
  */
  public ODBEvent(String msg) {
    this.msg = msg;
  }
  /**
  getMessage
  @return String, the EventMessage
  */
  public String getMessage() {
    return msg;
  }
  private String msg;
And its eDict application (in JavaFX)
Java:
public class eDict extends Application implement ODBEventListening {
  public void start(Stage stage) {
    ...
  }
  ...
  // implement the ODBEvent-----------------------------------------------------
  public void odbEvent(ODBEvent event) {
    // do the rescue operation
  }
  ...
}
This following image shows you how the DODB works with PubSub IMS:

AccessPath.png

So far and so good. The implementation of an ODB Server is naturally complex and requires the developers a clear ODB implementation concept. The best concept is in my opinion to implement a core package of APIs so that any ODB Server that bases on the core can be implemented independently in any GUI technology (e.g. SWING or JavaFX). As said afore, ODB bases on serialized objects and the serialized data are OOPL-dependent. However the API core which relies on TCP/UDP socket/MulticastSocket technology is then open for other OOPL when the fore-and-back transfer of (serialized) data block is so to specify that the serialized data for write/retrieve are format-free (e.g. in Byte-Array). With the Data Transfer Specification other OOPLs need only to implement on the Client site their serialized object in their own way and communicate via TCP-Socket to a Java ODB Server as a byte package. In my implementation I have specified the buffer format for the Client-Server communication:

Buffer Format:

Code:
     Byte Pos.  length-in-bytes  name   OBBStream
       0:             1          bool   contains a boolean. x01 for true, x00 for false (default)
       1-4:           4          Int    contains an Int. 0 if no Int is needed (default)
       5-6:           2          msg    contains msg length, mLen can be 0
       7-8:           2          err    contains err length, eLen can be 0
       9-12:          4          list   contains list length, lLen can be 0
       13-16:         4          obj    contains the byte array length of an object, bLen can be 0
    ---------------- 17 bytes --------------------------------------------------------------------
       17            mLen               start of msg String
       a = 17+mLen   eLen               start of err String
       b = a+eLen    lLen               start of list or array of all String items (see note)
       c = b+lLen    bLen               start of byte array
    ----------------------------------------------------------------------------------------------
     TOTAL bytes for Buffer: len = c+bLen (min. 17 bytes)
Note:
  • List-Length is the number of bytes for the list, and not the number of elements (size)
  • If a length of component X is 0 the positions of the following components start at the X position. For example: if mLen = 0 the a = 17, b = a, c = b.
  • The given 2 or 4 bytes represent the size of a short (2) or an int (4) and contains the value of the short or int. Example: X = 10 is for a short: byte 0 = 0x00, byte 1 = 0x0A. Or for an int: byte 0 = 0x00, byte 1 = 0x00, byte 2 = 0x00, byte 4 = 0x0A.
With the buffer format any memory block can be sent as a byte array over any socket in any OOPL. For example: in C# the API MemoryStream can be used to create a variable buffer area according to the specified Buffer Format and then the content can be converted to the needed byte[ ] using the method ToArray().

Distributed ODB means an ODB whose data are distributed physically among several nodes and appear virtually to the users as if they were on one node. The problem that hicder talked about is in the notation of different tables (nodes) and if one node is for whatever reason not available the app will run into troubles with data gap or even crashes. Object is an entity and the data can be accessed only as the whole by the name of its class which is used in the app and therefore it is always accessible. When an object is instantiated it gets an unique name which can be used as an access key of the object (data). Therefore a similar SQL table for objects is not a necessity.

AccessSequence.jpg
The processing sequence.

With the implementation of PubSub IMS it is possible to catch the fallout of a list (or table) during the processing or to auto-update a list when a fall-out node restarts. The following images show you how the app eDict catches the fall-out and how it updates the list after the recovery of a failed node.

app_1.jpg
eDict to DODB Server 9999 and gets the first part of the data (each entry is a POJO object)

app_2.jpg
Second DODB Server 8888 comes up and eDict reacts almost simultanously and updates its table

app_3.jpg
DODB Server 8888 was down. Again eDict reacts and updates its table to suit the new situation

app_4.jpg
Second DODB Server 8888 comes up again and eDict starts to update its table

app_5.jpg
The primary DODB Server 9999 was down and eDict switches its connection directly to the second DODB Server 8888

As you see, instead of paying fee for any DB provider and being dependent on it you can implement your own Distributed Object Database to suit your (company) requirements and you are free as Uncle Ho said "Nothing is more precious than Independency" and he was right :)

Joe
Note: mail my inbox if you need want to have the sources (written in pure Java) or the JAR files (JDK 8)
 
Sửa lần cuối: