Sunday, October 30, 2011

Detect constrained tree schema programmatically.

Tree schema in a relational database is built on hierarchical relationships between relational tables. It is one of the most widely used type of schemas in OLTP applications today.

I present here the java code to identify if some or all the tables in a schema belong to a tree structure and what table sits at the top of this tree (or root depending on how you view it). The code also identifies the primary keys of the tables in tree hierarchy. This effectively identifies TPCC schema (used to measure TPCC benchmark by hardware and software vendors) as a constrained tree schema.

A constrained tree schema, is used in OLTP systems frequently to achieve very low latency and high through put. If you view TPCC schema carefully, then you would realize that "w_id" column in warehouse table is the root key of this tree hierarchy which also means that all the tables that belong to this tree, have composite primary keys that include "w_id", although w_id column is renamed as "d_w_id" in district table and "c_W_id" in customer table and so on.

Table partitioning (breaking data in a relational table into multiple paritions) is one of the ways an extreme transaction processing applications could achieve the desired scalability. The purpose of this post is to present the java code, if you want more information on constrained tree schema and applications on top of it, this article by  Billy Newport's article is a great start.

Identification of a constrained tree schema and the participating tables is important in evaluating if you could achieve single sited transactions that boost performance of an application dramatically.  "The end of an architecture era" is a wonderful document written by Michael Stonebreaker at MIT and others, that presents a change in application paradigm to achieve high scalability.

The Java code presented below should be used as a guidance only. Adding robust exception handling and follwoing best coding practices would help make this code production ready. The code assumes that you are using Sybase ASE database as a datasource that has "sp_fkeys" stored procedure which lists table dependencies through pk-fk constraints. You may have to rewrite the portion of the code  and the database connection information section, for the database you are using. Identifying TPCC schema as a Constrained Tree Schema is easy enough to do manually because it has only 8 tables, but compare it to an application like SAP R3 which has more than 75,000 tables in its schema and you will quickly appreciate the necessity of automation in this regard.

The code uses the depth first strategy rather than breadth first strategy, not for any specific reason. The java code treats relationships between the tables as a graph. It considers the tables as nodes and the relationship between them asedges of the graph. For more on this subject you may want to read this very interesting chapter from a book, the artful software.

Running this code displays the name of the root table and name of the root key and the depth of the tree. It also displays the list of tables that belong to the tree and their primary keys. There are other smaller trees in TPCC schema, but the one with deepest hierarchy is what we are interested in and the table at the top of the hierarchy is our root table. This information in addition to some other aspects such as tables sizes and volume of different transactions would help an application owner decide on the table partioning strategy.

//The java code
package YOUR PACKAGE NAME


import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;


import com.sybase.jdbc4.jdbc.SybCallableStatement;
import com.sybase.jdbcx.SybDriver;
public class LatestTpcc {
 /**
  * @param args
  * @throws SQLException
  */
 public static void main(String[] args) throws SQLException {
  // TODO Auto-generated method stub
  Connection con = null;
  try {
   SybDriver sybDriver = (SybDriver) Class.forName(
     "com.sybase.jdbc4.jdbc.SybDriver").newInstance();
   sybDriver.setVersion(com.sybase.jdbcx.SybDriver.VERSION_7);
   DriverManager.registerDriver(sybDriver);
   con = DriverManager.getConnection(
     YOUR CONNECTION INFORMATION  } catch (Exception e) {
   e.printStackTrace();
  }
  ArrayList<String> allTables = new LatestTpcc().getAllTables(con);
  // loop through all tables bring their pk, fks
  Iterator tableIterator = allTables.iterator();
  String tableName = null;
  ResultSet rs2 = null;
  HashSet<String> allKeysSet = new HashSet<String>();
  ArrayList<String> pair;
  HashSet<ArrayList> allPairsSet = new HashSet<ArrayList>();
  while (tableIterator.hasNext()) {
   tableName = (String) tableIterator.next();
   rs2 = new LatestTpcc().getFKeys(con, tableName);
   // loop through the result set
   while (rs2.next()) {
    pair = new ArrayList<String>();
    pair.add(rs2.getString("pkcolumn_name") + ":"
      + rs2.getString("pktable_name"));
    pair.add(rs2.getString("fkcolumn_name") + ":"
      + rs2.getString("fktable_name"));
    pair.add("notReached");
    allPairsSet.add(pair);
    // System.out.println("adding pair :"+rs2.getString("pkcolumn_name")+":"+rs2.getString("fkcolumn_name"));
    allKeysSet.add(rs2.getString("pkcolumn_name") + ":"
      + rs2.getString("pktable_name"));
    allKeysSet.add(rs2.getString("fkcolumn_name") + ":"
      + rs2.getString("fktable_name"));
   }
  }
  // now that we have all sets and everything let's light up
  Iterator allKeysSetIter = allKeysSet.iterator();
  String theCurrentKey = null;
  HashSet<String> c = new HashSet<String>();
  HashMap<String, HashSet> keysToColumns = new HashMap<String, HashSet>();
  int maxDepth = 0;
  int theDepth = 0;
  String maxDepthKey = "";
  while (allKeysSetIter.hasNext()) {
   theCurrentKey = (String) allKeysSetIter.next();
   theDepth = 0;
   // all pairs need to be set to non reached for every loop
   // so loop through and set to non reached
   Iterator allPairsSetIterTemp = allPairsSet.iterator();
   while (allPairsSetIterTemp.hasNext()) {
    ArrayList<String> keyPair = (ArrayList) allPairsSetIterTemp
      .next();
    if (keyPair.size() > 2) {
     keyPair.set(2, "notReached");
    }
   }
   // which pair has this key? I will add corresponding key in the pair
   // to this set c
   // so loop through allPairsSet
   boolean reached = false;
   do {
    Iterator allPairsSetIter = allPairsSet.iterator();
    reached = false;
    while (allPairsSetIter.hasNext()) {
     ArrayList<String> keyPair = (ArrayList) allPairsSetIter
       .next();
     boolean keyVisited = false;
     if (keyPair.size() > 2 && keyPair.get(2).equals("reached")) {
      keyVisited = true;
     }
     if ((((String) (keyPair.get(0))).equals(theCurrentKey) || new LatestTpcc()
       .gotAMatch(((String) (keyPair.get(0))), c))
       && keyVisited != true) {
      c.add(theCurrentKey);
      c.add((String) (keyPair.get(1)));
      keyPair.set(2, "reached");
      theDepth++;
      // c.add(String.valueOf(theDepth));
      // System.out.println("the depth of the tree :"+theDepth+" for: "+theCurrentKey);
      reached = true;
      // System.out.println("relationship :"+theCurrentKey+" : "+(String)(keyPair.get(1)));
     }
    }
   } while (reached == true);
   keysToColumns.put(theCurrentKey, c);
   if (theDepth > maxDepth) {
    maxDepth = theDepth;
    maxDepthKey = theCurrentKey;
   }
   c = new HashSet<String>();
  }
  new LatestTpcc().displayValues(keysToColumns, maxDepthKey);
  System.out.println("max depth key : " + maxDepthKey + " depth :"
    + maxDepth);


 }
 private void displayValues(HashMap map, String maxDepthKey) {
  Iterator mapIter = map.keySet().iterator();
  while (mapIter.hasNext()) {
   String key = (String) mapIter.next();
   HashSet<String> hs = (HashSet<String>) map.get(key);
   // display for only max depth key
   if (key.equals(maxDepthKey)) {
    System.out.println("Tree : " + key);
    Iterator hsIter = hs.iterator();
    while (hsIter.hasNext()) {
     String nextKey = (String) hsIter.next();
     System.out.println("element : " + nextKey);
    }
   }
  }
  // System.out.println("the end");
 }
 private boolean gotAMatch(String fetchedKey, HashSet<String> c) {
  // loop through c and see if you find a match
  Iterator cIter = c.iterator();
  while (cIter.hasNext()) {
   String fromC = (String) cIter.next();
   if (fromC.equals(fetchedKey)) {
    return true;
   }
  }
  return false;
 }
/**
 *
 * @param con
 * @return
 * Returns the list of all user tables in this database
 */
 private ArrayList<String> getAllTables(Connection con) {
  ArrayList<String> al = new ArrayList<String>();
  try {
   Statement stmt = con.createStatement();
   ResultSet rs = stmt
     .executeQuery("select name from sysobjects where type = 'U'");
   while (rs.next()) {
    al.add(rs.getString("name"));
   }
  } catch (Exception e) {
   e.printStackTrace();
  }
  return al;
 }
/**
 *
 * @param con
 * @param tableName
 * @return
 * returns information such as primary key & table name of the table that's dependent on this table
 */
 private ResultSet getFKeys(Connection con, String tableName) {
  ResultSet rs = null;
  try {
   Statement stmt = con.createStatement();
   String procName = "sp_fkeys";
   String execRPC = "{call " + procName + " (?)}";
   SybCallableStatement scs = (SybCallableStatement) con
     .prepareCall(execRPC);
   scs.setString(1, tableName);
   scs.setParameterName(1, "@pktable_name");
   rs = scs.executeQuery();
  } catch (Exception e) {
   e.printStackTrace();
  }
  return rs;
 }
}