Platform Analytics 7 Dataflow

WI_NUMBEROFJOBS table

This describes the wi_numberofjobs table and how each column of data arises from LSB_EVENTS table.
The wi_numberofjobs get its data from the Data Collection Table LSB_EVENTS.
The data is grouped into different grouping ("By Project", "By Cluster", "By Queue", "By Host" or "By User") across the four different count of job statistics ("Run", "Pend", "Susp" and "Wait"). Then the aggregated record is being put into this table.
This is the column description of each data column of WI_NUMBEROFJOBS and how each column is filled with data.
Columns
Description
Key
CLUSTER_CODE
This comes from the "ClusterName" field in the raw table. Once we get the cluster_name, we then look it up in the wi_clustercode table to see if we have already has a record of it in there. If we do, then we'll get the code back, otherwise, we will insert it into the wi_clustercode table and generate the code. The code itself is a positive integer and each new code is equal to the maximum of the sequence number +1.
Primary key
TIME_STAMP
This is in GMT and is always end in the hour such as 02:00:00. This marks the time that this record is aggregated. For example, 02:00:00 means that all the records between 02:00:00 and 02:59:00 are aggregated into this record.
Primary key
GROUP_NAME
This indicates what kind of grouping the statistics are summed against. There are five kind of grouping, "By Cluster", "By Host", "By User", "By Project" or "By Queue".
Primary key
GROUP_CODE
This could come from different source and it is depends on the group_type. If the group_type is "By Host", then it comes from "HostName" field of the data file. If the group_type is "By Cluster", then it comes from "ClusterName" field of the data file. If the group_type is "By Queue", then it comes from "QueueName" field of the data file. If the group_type is "By Project", then it comes from "ProjectName" field of the data file. If the group_type is "By User", then it comes from "UserName" field of the data file. Once we get the field, we then look it up in the wi_dimensioncode table to see if we have already has a record of it in there by matching the dimension_name. If we do, then we'll get the code back, otherwise, we will insert it into the wi_dimensioncode table and generate the code. The code itself is a positive integer and each new code is equal to the maximum of the existing code+1.
Primary key
JOB_STATUS
This field contains the indicator of which job status the value fields are aggregated of under the grouping. It could be one of "SUSP" (suspended job), "RUN" (running job), "WAIT" (waiting job) or "PEND" (pending job).
Primary key
JOB_TYPE
This is the type of job. It could be one of "BOTH", "BATCH", "INTERACTIVE" and "PARALLEL".
Primary key
TOTAL_JOB_NUMBER
This is aggregated based on different group_type.
If the group_type is "By Project", then the aggregation is grouped by "ProjectName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Host", then the aggregation is grouped by "HostName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By User", then the aggregation is grouped by "UserName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Queue", then the aggregation is grouped by "QueueName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Cluster", then the aggregation is first aggregate across the interval based on "ClusterName", "JobType" and "Timediff". Then it will aggregate for the hour and is grouped by "ClusterName", "Jobtype" and "Timediff".
The aggregation will then calculate the sum, the minimum, the maximum and count the records for the four type of job status for the hour.
The sum will be put into this column for each record.

MIN_JOB_NUMBER
This is aggregated based on different group_type.
If the group_type is "By Project", then the aggregation is grouped by "ProjectName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Host", then the aggregation is grouped by "HostName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By User", then the aggregation is grouped by "UserName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Queue", then the aggregation is grouped by "QueueName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Cluster", then the aggregation is first aggregate across the interval based on "ClusterName", "JobType" and "Timediff". Then it will aggregate for the hour and is grouped by "ClusterName", "Jobtype" and "Timediff".
The aggregation will then calculate the sum, the minimum, the maximum and count the records for the four type of job status for the hour.
The minimum will be put into this column for each record.

MAX_JOB_NUMBER
This is aggregated based on different group_type.
If the group_type is "By Project", then the aggregation is grouped by "ProjectName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Host", then the aggregation is grouped by "HostName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By User", then the aggregation is grouped by "UserName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Queue", then the aggregation is grouped by "QueueName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Cluster", then the aggregation is first aggregate across the interval based on "ClusterName", "JobType" and "Timediff". Then it will aggregate for the hour and is grouped by "ClusterName", "Jobtype" and "Timediff".
The aggregation will then calculate the sum, the minimum, the maximum and count the records for the four type of job status for the hour.
The maximum will be put into this column for each record.

COUNTER
This is aggregated based on different group_type.
If the group_type is "By Project", then the aggregation is grouped by "ProjectName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Host", then the aggregation is grouped by "HostName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By User", then the aggregation is grouped by "UserName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Queue", then the aggregation is grouped by "QueueName", "ClusterName", "Jobtype" and "Timediff".
If the group_type is "By Cluster", then the aggregation is first aggregate across the interval based on "ClusterName", "JobType" and "Timediff". Then it will aggregate for the hour and is grouped by "ClusterName", "Jobtype" and "Timediff".
The aggregation will then calculate the sum, the minimum, the maximum and count the records for the four type of job status for the hour.
The count will be put into this column for each record.

LOCAL_SERVERTIME
This comes from the "Time_stamp" in this table. This is transformed into local server time.

INSERT_SEQ
This is a system generated sequence number. For each new record inserted, a unique sequence number is being assigned to this column.