Parallel Programming

Joe

Thành viên VIP
21/1/13
2,895
1,293
113
Hi

Parallel Programming ? What is that? Most of us know what it is, but do most of us know how to code PP? We, human, do daily PP. We perform the seeing, hearing, feeling, talking, thinking simultaneously -or PARALLELLY. In Computer Science Parallelism is the computing algorithm that divides (or breaks) a process into many tasks so that they could be executed parallely. The following image depicts the "breaking algorithm"


(source: https://computing.llnl.gov)

The image shows you a process that is broken into 4 little tasks that can be executed parallelly by 4 processors. The problem of parallelism is NOT the technology that perform the parallel computing, but the state of the process in itself. There are processes that are intricately dependent from itself as a whole so that it is simply impossible to divide or to break it into different independent tasks. For example: the process Growth of a fruit. First the flower must be pollinated, then the fruit-development, then finally a ripe fruit. No Flower = No Pollination = No Development = No ripe Fruit. Or: Fruit = Flower->Pollination->Development->Ripeness. A breaking of the growth of a Fruit into 4 parallel little tasks (Flower, Pollination, Development and Ripeness) is here impossible.

However, most of IT problems or processes can be broken into many little tasks which are so independent from each other that they can be solved parallelly. Parallel Computing can be done in different levels:
  • bit-level
  • Instruction-level
  • task-level.
Bit-level is the lowest level and is the most efficient level. Normally a processor is driven by an instruction set (or assembly instructions). Each instruction is a chain of n bits (hence: 2, 4, 8, 16, 32, 64 bits, etc.). which are broken into different functional groups (e.g. address, data, control, etc.) and each group can be executed parallelly. Some computers used Bitslice processors to build a larger processor. In this case the n-bits are usually pipelined to enhance the parallelism (e.g. during the executing of an instruction the following instruction is preparing (or decoding) and the next is fetching). This technique of Instruction prefetching is the Pipelining technology. More: HERE

Instruction-level is only feasible if the executing instructions are independent from each other. For example, a calculation of n additions where the operands and results are independent from other additions. More: HERE. Intel-MultiCores is a typical Instruction-Level parallelism (n x 32-bits processors).

Task-level is more complex than the two mentioned Parallelism. A Task is an operation of n instructions. When a group of operations which are independent from each other they can be executed parallelly. In such a case Task-level is called Task Parallelism or Function Parallelism. More: HERE.

With C/C++ one can go deeper into the Instruction-Level. But JAVA is a high-level OOPL. I show you a way how to work with Java and Task-Parallelism. As an example we take an array of strings and the strings should be sorted alphabetically. Of course, JAVA provides already 2 different ways of String Sorting:
  • Collections.sort(List<String>)
  • Arrays.sort(String[] S)
Because the Strings are independent from each other it's relatively easy to tackle the sorting parallelly. The following image shows you how the "sorting tasks" parallelly work:

ParallelTask.png

Each "little task" checks for the position of the designated String within the source array and puts it in this position of a sorted array. N strings = N parallel Tasks. Because each computer has only a limited number X of cores (or processors) only X tasks can be executed parallelly while the others have to wait. JAVA ForkJoinPool and ForkJoinTask are the purposed for such a Parallelism. One should NOT confuse Concurrency with Parallelism. Concurrency is a competition of ALL tasks against each other at the same time -independent from the number of cores (processors).
Note: NOT every OOPL offers the Parallel Computing facilty.

The codes: JoeSortTask.java
PHP:
import java.util.List;
import java.util.concurrent.*;
/**
* Joe Nartca (C)
* JoeSort is an implementation of Concurrent & Parallel Programming
*/
public class JoeSortTask {
    /**
    List of String is converted into String array and then sorted. The Order of the sorted character set
    is defined by JAVA API Local
    @param list List of strings to be sorted
    @return a sorted String array
    */
    public static String[] listSort(List<String> list) {
        return parallelSort(list.toArray(new String[list.size()]));
    }
    /**
    Sorted String array. The Order of the sorted character set s defined by JAVA API Local
    @param sa string array
    @return sorted String array
    */
    public static String[] parallelSort(String[] sa) {
        String[] sorted = new String[sa.length];
        // submit & fork to run parallelly
        ForkJoinPool.commonPool().submit(()->{
          for (int j = 0; j < sa.length; ++j) {
            (new Sorting(j, sorted, sa)).fork();
          }
          return null;
        }).join(); // wait for all termination
        return sorted;
    }
    //
    private static class Sorting extends ForkJoinTask<Void> {
      public Sorting(int ix, String[] sorted, String[] sa) {
        this.sorted = sorted;
        this.sa = sa;
        this.ix = ix;
      }
      public boolean exec() {
        int p = 0;
        int ls = sa[ix].length();
        char b0 = sa[ix].charAt(0);
        A: for (int i = 0; i < sa.length; ++i)
        if (b0 > sa[i].charAt(0)) ++p;
        else if (i != ix && b0 == sa[i].charAt(0)) {
          int ld = sa[i].length();
          for (int l = 1; l < ls && l < ld; ++l)
          if (sa[ix].charAt(l) > sa[i].charAt(l)) {
            ++p; continue A;
          } else if (sa[ix].charAt(l) < sa[i].charAt(l)) continue A;
          if (ls > ld) ++p;
        }
        while (sorted[p] != null) ++p;
        sorted[p] = sa[ix];
        return true;
      }
      public Void getRawResult() { return null; }
      protected void setRawResult(Void v) { }
      //
      private String[] sa, sorted;
      private int ix;
    }
}
the test: SortTask.java
PHP:
import java.io.*;
import java.util.*;
import java.nio.file.*;
// Joe Nartca (C)
public class SortTask {
  public static void main(String... a) {
    try {
      int n;
      byte[] cont = Files.readAllBytes((new File(a.length>0?a[0]:"testFile.txt")).toPath());
      // do the JoeSorting....
      String big = (new String(cont)).replaceAll("\r","");
      String[] sa = Arrays.stream(big.split("\n"))
                .filter(s -> (s != null && s.length() > 0))
                .toArray(String[]::new);
      List<String> al = Arrays.asList(sa);
      // listSort of big String...
      long beg = System.nanoTime();
      String[] ss = JoeSortTask.listSort(al);
      System.out.println("\nTask.listSort:"+((double)(System.nanoTime()-beg)/1000)+
                 " microSec.");
      n = 1;
      for (int i = 0; i < ss.length; ++n, ++i) System.out.println(n+".Position:"+ss[i]);
      // parallelSort of String array...
      beg = System.nanoTime();
      ss = JoeSortTask.parallelSort(sa);
      System.out.println("\nTask.parallelSort:"+((double)(System.nanoTime()-beg)/1000)+
                 " microSec.");
      n = 1;
      for (int i = 0; i < ss.length; ++n, ++i) System.out.println(n+".Position:"+ss[i]);
      // create an ArrayList from sa[]
      beg = System.nanoTime();
      Collections.sort(al);
      System.out.println("\nCollections.Sort:"+((double)(System.nanoTime()-beg)/1000)+
                 " microSec.");
      n = 1;
      for (int i = 0, mx = al.size(); i < mx; ++n, ++i)
        System.out.println(n+".Position:"+al.get(i));
      // run the Arrays API sort
      System.arraycopy(sa, 0, ss, 0, sa.length);
      beg = System.nanoTime();
      Arrays.sort(ss);
      System.out.println("\nArrays.Sort:"+((double)(System.nanoTime()-beg)/1000)+
                 " microSec.");
      n = 1;
      for (int i = 0, mx = al.size(); i < mx; ++n, ++i)
        System.out.println(n+".Position:"+al.get(i));
    } catch (Exception ex) {
      ex.printStackTrace();
    }
  }
}
The List: testFile.txt
Code:
Huy
Paul
Trung
Chung
Arnold
Tom
Joe
joe
Annette
Elisabeth
Elli
Peter
Jane
Bill
Walton
Steven
Rick
James
Stephan
Jim
Barrack
Charles
Lan
Richard
Anna
Joe
Anton
Johannes
Jan
Margarette
Maria
Emma
Mike
Michael
Lauren
Thomas
William
Bill
Hillary
Arnie
Quintin
How run: on a CMD window.
Code:
C:\links\java\Sort>javac  -g:none -d ./classes *.java
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

C:\links\java\Sort>java SortTask > x.txt

C:\links\java\Sort>
and here is the sorted result:
Code:
Task.listSort:9728.0 microSec.
1.Position:Anna
2.Position:Annette
3.Position:Anton
4.Position:Arnie
5.Position:Arnold
6.Position:Barrack
7.Position:Bill
8.Position:Bill
9.Position:Charles
10.Position:Chung
11.Position:Elisabeth
12.Position:Elli
13.Position:Emma
14.Position:Hillary
15.Position:Huy
16.Position:James
17.Position:Jan
18.Position:Jane
19.Position:Jim
20.Position:Joe
21.Position:Joe
22.Position:Johannes
23.Position:Lan
24.Position:Lauren
25.Position:Margarette
26.Position:Maria
27.Position:Michael
28.Position:Mike
29.Position:Paul
30.Position:Peter
31.Position:Quintin
32.Position:Richard
33.Position:Rick
34.Position:Stephan
35.Position:Steven
36.Position:Thomas
37.Position:Tom
38.Position:Trung
39.Position:Walton
40.Position:William
41.Position:joe

Task.parallelSort:117.8 microSec.
1.Position:Anna
2.Position:Annette
3.Position:Anton
4.Position:Arnie
5.Position:Arnold
6.Position:Barrack
7.Position:Bill
8.Position:Bill
9.Position:Charles
10.Position:Chung
11.Position:Elisabeth
12.Position:Elli
13.Position:Emma
14.Position:Hillary
15.Position:Huy
16.Position:James
17.Position:Jan
18.Position:Jane
19.Position:Jim
20.Position:Joe
21.Position:Joe
22.Position:Johannes
23.Position:Lan
24.Position:Lauren
25.Position:Margarette
26.Position:Maria
27.Position:Michael
28.Position:Mike
29.Position:Paul
30.Position:Peter
31.Position:Quintin
32.Position:Richard
33.Position:Rick
34.Position:Stephan
35.Position:Steven
36.Position:Thomas
37.Position:Tom
38.Position:Trung
39.Position:Walton
40.Position:William
41.Position:joe

Collections.Sort:601.9 microSec.
1.Position:Anna
2.Position:Annette
3.Position:Anton
4.Position:Arnie
5.Position:Arnold
6.Position:Barrack
7.Position:Bill
8.Position:Bill
9.Position:Charles
10.Position:Chung
11.Position:Elisabeth
12.Position:Elli
13.Position:Emma
14.Position:Hillary
15.Position:Huy
16.Position:James
17.Position:Jan
18.Position:Jane
19.Position:Jim
20.Position:Joe
21.Position:Joe
22.Position:Johannes
23.Position:Lan
24.Position:Lauren
25.Position:Margarette
26.Position:Maria
27.Position:Michael
28.Position:Mike
29.Position:Paul
30.Position:Peter
31.Position:Quintin
32.Position:Richard
33.Position:Rick
34.Position:Stephan
35.Position:Steven
36.Position:Thomas
37.Position:Tom
38.Position:Trung
39.Position:Walton
40.Position:William
41.Position:joe

Arrays.Sort:26.8 microSec.
1.Position:Anna
2.Position:Annette
3.Position:Anton
4.Position:Arnie
5.Position:Arnold
6.Position:Barrack
7.Position:Bill
8.Position:Bill
9.Position:Charles
10.Position:Chung
11.Position:Elisabeth
12.Position:Elli
13.Position:Emma
14.Position:Hillary
15.Position:Huy
16.Position:James
17.Position:Jan
18.Position:Jane
19.Position:Jim
20.Position:Joe
21.Position:Joe
22.Position:Johannes
23.Position:Lan
24.Position:Lauren
25.Position:Margarette
26.Position:Maria
27.Position:Michael
28.Position:Mike
29.Position:Paul
30.Position:Peter
31.Position:Quintin
32.Position:Richard
33.Position:Rick
34.Position:Stephan
35.Position:Steven
36.Position:Thomas
37.Position:Tom
38.Position:Trung
39.Position:Walton
40.Position:William
41.Position:joe
As you see, JoeSortTask.parallelSort is only slower than the native Arrays.sort (in Instruction-level which is naturally FASTER than in the Task-Level)
 
Sửa lần cuối:

Joe

Thành viên VIP
21/1/13
2,895
1,293
113
Is this the same as multithreading?
It's the question about Computer Science. People usually confuse
  • Multi-Threading with
  • Multi-Tasking and with
  • Parallel Programming
Let me clarify these techniques for you and CongdongJava. First of all all these techniques share the only purpose to enhance the throughput of a process (or program). The throughput is no other thing than the processing efficiency of a process within a certain amount of time.

Definition:
A task is an entity that runs independently in an OS environment. Example: Printer-Task, Console-Task, Communication Task, etc. Each task has its own address space and environment. In a broad view a task is an independent process. If a task that can be broken down into some little independent chunks and these chunks can be executed independently these chunks are threads in Computer Science. That means that threads don't have an own address space or own environment, but have to share it with their owner (the app). Therefore threads must be synchronized with each other while Tasks must not.

Processing:
From this definition tasks are the "threads" of the Operating System (OS) while threads are "tasks" of an app which is run as a task within the OS. Or in other words: task is executed by the CPU controlled by OS and threads by the "workers" controlled by the app. Workers are the features of a Programming Language (PL). And only modern PL provides these features.

Multi-Tasking:
OS is a very BIG app that runs the hardware that is specifically built for it. This hardware is also called the CPU (Central Processing Unit) or Processor for short. In the past Multi-Processors computer is rare and extremely expensive. Most of the middle ranged computers were single CPU (WANG, DEC, PRIME, TANDEM, etc.) To make sure that all tasks (printer, console, etc.) are equally executed OS divides a certain time amount into a definite interval for each task and runs sequentially round-robin around the tasks. More tasks = small time interval. The time interval is called the time-slice or time-window for running each task and for the users it seemed as if the OS ran parallelly multiple tasks. The Multi-Tasking.

Multi-Threading:
Modern Programming Language usually provides the programming features of threads which are exactly similar to the OS features with Tasks. Instead of calling "Multi-Tasking" it is "Multi-Threading". In some sense they are the same. If you see an user app as an "user OS" then threads are the "user tasks".

Parallel Programming (Parallelism):
Today every desktop is equipped by multiple Cores processor. Multiple-Cores PC is omnipresent, but not everyone knows how to work with it. All modern PL or OOPL were designed before Multi-Cores era breaks in. Parallel Programming is only feasible if the tasks or threads are fully independent from each other AND the computer must be a Multi-Cores Computer.

Concurrent Programming (Concurrency):
Concurrency is a try to run several tasks parallelly -independent from the number of cores. Concurrency runs without problem on a single core as well as on a multiple cores computer. And the parallelism is made upon a competition. The word "Work Stealing" is the notation for such a competition. It's the concurrent algorithm: N cores are available, some cores are free and some others are overloaded with works. When the concurrency detects a free core Y and core X is overloaded with many works (tasks or threads) it takes (steals) some works of core X and puts them onto the free core Y.
 

Joe

Thành viên VIP
21/1/13
2,895
1,293
113
(cont.)
The API java.util.Arrays does offer the method parallelSort(String[ ]), too. Let check this method against the Parallel Programming JoeSortTask.parallelSort(String[ ]). To do that we modify the test program SortTask.java by including the Arrays.paralelSort() without printing the sorting result (because we know that they work) as following:
PHP:
import joe.JoeSortTask;
public class SortTaskX {
  public static void main(String... a) {
    try {
      byte[] cont = Files.readAllBytes((new File(a.length>0?a[0]:"testFile.txt")).toPath());
      // do the JoeSorting....
      String big = (new String(cont)).replaceAll("\r","");
      String[] sa = Arrays.stream(big.split("\n"))
                .filter(s -> (s != null && s.length() > 0))
                .toArray(String[]::new);
      List<String> al = Arrays.asList(sa);
      // listSort of big String...
      long beg = System.nanoTime();
      String[] ss = JoeSortTask.listSort(al);
      System.out.println("\nTask.listSort:"+((double)(System.nanoTime()-beg)/1000)+
                 " microSec.");
      // parallelSort of String array...
      beg = System.nanoTime();
      ss = JoeSortTask.parallelSort(sa); // <<-- Parallel Programming 
      System.out.println("\nTask.parallelSort:"+((double)(System.nanoTime()-beg)/1000)+
                 " microSec.");
      // create an ArrayList from sa[]
      beg = System.nanoTime();
      Collections.sort(al);
      System.out.println("\nCollections.Sort:"+((double)(System.nanoTime()-beg)/1000)+
                 " microSec.");
      // run the Arrays API sort
      System.arraycopy(sa, 0, ss, 0, sa.length);
      beg = System.nanoTime();
      Arrays.sort(ss);
      System.out.println("\nArrays.Sort:"+((double)(System.nanoTime()-beg)/1000)+
                 " microSec.");
      // run the Arrays API sort
      System.arraycopy(sa, 0, ss, 0, sa.length);
      beg = System.nanoTime();
      Arrays.parallelSort(ss);  // <<----java.util.Arrays.parallelSort() 
      System.out.println("\nArrays.parallelSort:"+((double)(System.nanoTime()-beg)/1000)+
                 " microSec.");
    } catch (Exception ex) {
      ex.printStackTrace();
    }
  }
}
The result:
Code:
C:\links\java\Sort>javac -g:none -d ./classes SortTaskX.java

C:\links\java\Sort>java SortTaskX

Task.listSort:11172.2 microSec.

Task.parallelSort:169.8 microSec.

Collections.Sort:581.2 microSec.

Arrays.Sort:30.5 microSec.

Arrays.parallelSort:440.3 microSec.

C:\links\java\Sort>
The Parallel Programming JoeSortTask.parallelSort() needs 169.8 microSec. to complete the sorting while the "standard" java.util.Arrays.parallelSort() takes 440.3 microSec. to finish the same sorting work.


Conclusion: Parallel Programming requires a smart analysis of a process to find out which parts of the process are independent from the rest and HOW they could be processed parallelly. The algorithm is yours (like what I do with the JoeSortTask). In short: Imagination and Abstract-Thinking are the bases of Parallel Programming.
 
Sửa lần cuối: