JCL Big Sorting

 

What does the sample do?

JCL_Big_Sorting is a distributed sorting application that mantains the items sorted and distributed over a cluster, avoiding large amount of data transfers.

 

How do I run it?

It is necessary, first, to start a JCL cluster. Then you must import the JCL Sorting Eclipse project. The application will generate the input files per core of the cluster and sort them, producing a unique sorted and distributed output.

How do I use it?

The big sorting has no GUI, so let's look at the code.

In our Main.java file, inside the constructor, the first 22 lines refer to generation of the input files, which are generated randomly. An input file is created for each cluster core. The user can create small or large amounts of numbers or items during this phase. 

Once the files have already been generated, the Sorting.java class is registered, which is responsible for sorting. It is divided into 4 phases. In the first phase a local traversal of items is performed. The sorting key idea is based on items frequencies, so for each number found in the local input its frequency is incremented. After the traversal, a map of items is stored in JCL multicore version for the next phases. Before finishing the first phase, each task mount a local partition schema for the items, i.e. each task partitions its numbers according to their frequencies and the number of cores in JCL cluster. The output of this function is a set of pivots numbers and the cumulative frequencies until each pivot. In summary, each task is given its idea of how to partition its numbers according to each core of the cluster.

Unfortunately, each task has only its local partition idea, so the Main.java class, which started all tasks for the phase one, receives all tasks local schemas and performs a global partition using all the results of the first phase.  Then the phase two starts, so the Main.java class starts other n tasks to apply the global partition schema in the local data, computed in phase one and stored in JCL, as mentioned before. The number n is the number of cores in the cluster, so in phases one and two there is one task per cluster core.  Each task traversals its map of items, separating the numbers according to the global partition, i.e. according to each core of the JCL cluster. For that a set of JCL-Hashmaps is shared by all tasks. The JCL-Hashmap represents a simple way in JCL to share common resources.  There is one JCL-Hashmap per cluster core and each map entry represents all the items and their frequencies of a task in the cluster. For example, a task 1, representing core 1 in a cluster composed of 64 cores and 64 tasks and 64 JCL-Hashmaps, will insert one item in each JCL-Hashmap. The item is the task 1 numbers of the respective core. In summary, task 1 will insert 64 items, one per JCL-Hashmap. Each entry of each JCL-Hashmap is labeled with task 1 to avoid overlaps.  After inserting the last entry in the last JCL-Hashmap the phase two finishes.

The Main.java creates a synchronization barrier to wait for all tasks of the second phase, so the third phase can start after all results of phase two. The phase three starts n tasks again, where n is the number of cores in the cluster, as mentioned before. Each task accesses its JCL-Hashmap with its name and inside each map there are the numbers that such task will handle. Each entry of the JCL-Hashmap will have the numbers stored initially by the other cores, so it is necessary to merge the items and then sort them. For that, each task traversals the map, merging the items. After that, a local sorting is performed and the JCL-Hashmap can be deleted.

After phase three the Main.java can traverse all items using the global partition schema and using all cores of a JCL cluster to store each input partition. The fourth phase just validate the input and the output of the JCL-Big Sorting application.

 

Questions or comments, where can I go?

Questions about the API or about the codes of this application? See our Programming Guide and Installation Guide.

If you have any questions, please contact the JCL team.