Bigdata Mining With Fuzzylogic Using Mapreduce Algorithm (mra)

Joe

Thành viên VIP
21/1/13
2,885
1,288
113
As I explained Fuzzy Search And Data Mining to you I show you hereunder one of the possibilities to do BigData Mining with FuzzySearch which bases on the Google's MapReduce and is simplified as following:


(Source: talend.com)

To do that I've modified and simplified the API FuzzyPattern and named it as FuzzyBigSearch. This new API is free and its usage can be viewed in BigDataMining.java from the included ZIP for Download. The sources can be downloaded from HERE. FuzzyBigSearch provides only ONE constructor with an ArrayList of BigData: from the WEB (via URLs) and from the Disks (via Directories).

PHP:
  /**
  Constructor.
  <br>- URL must start with http:// or https://
  <br>- Absolute path. Only Remote Path or directory must be preceded by file://
  @param bdLst ArrayList of Strings of URLs or/and Paths
  @param pLst String Arraylist of patterns
  @exception Exception if one of the URLs or PATHs is invalid
  */
	for (int i = 0, mx = bdLst.size(); i < mx; ++i)  {
	  String u = bdLst.get(i);
	  if (u.startsWith("http")) {
		try {
		  (new URL(u)).openStream().close();
		  continue;
		} catch (Exception ex) { }
	  }
	  if (!u.startsWith("file://")) {
		if (!(new File(u).exists())) throw new Exception("Invalid "+u);
	  } else {
		URI uri = (new URL(u)).toURI();
		if(uri.getAuthority() != null && uri.getAuthority().length() > 0) {
		  u = "file://" + u.substring(5);
		  uri = (new URL(u)).toURI();
		  bdLst.set(i, u);
		}
		if (!(new File(uri).exists())) throw new Exception("Invalid "+u);
	  }
	}
	this.bdLst = bdLst;
	this.pLst = pLst;
	init();
  }
  ...
As you see, the ArrayList bdLst contains Strings that could be either valid URLs (format: https://anyLinkName) or valid paths (format: file://anyDirectoryName). Example: file://C:/JFX/mra (abs.path of directory mra -see Fig.1)


Fig.1


Fig.2


Fig.3

This technique allows you to search for patterns on "anywhere". Simultaneously on the WEB and on some areas on the Disks (see Fig.2 and Fig.3).
 

Attachments

Sửa lần cuối: