Platform Analytics 7 Dataflow

WI_PENDJOBNUM_BYGROUPS table

The wi_pendjobnum_bygroups table gets its information from the Data Collection Table DPR_BYINTERVAL.
For every hour, the ETL groups the data by different group types and combines this with the number of jobs belonging to the pending reason before inserting the resulting record into this table.
The following table describes each column and its source:
Column name
Source and description
Key
CLUSTER_CODE
This comes from the CLUSTER_NAME field in the raw table. Once we get the cluster_name, we then look it up in the wi_clustercode table to see if we have already has a record of it in there. If we do, then we'll get the code back, otherwise, we will insert it into the wi_clustercode table and generate the code. The code itself is a positive integer and each new code is equal to the maximum of the existing sequence+1.
Primary key
TIME_STAMP
This is the time stamp of all the record collections in the next hour, in GMT. For example, 02:00:00 means that all records between 02:00 and 02:59:59 are collected into this record.
Primary key
GROUP_TYPE
This is the type of group to which this record belongs. Valid values are "By Host Type", "By Reason", and "By Cluster"
Primary key
GROUP_CODE
This is the group code. There are different sources depending on the group_type:
For "By Host Type", the group name is a subfield of the PendReason field in the data file.
For "By Cluster", the group name is the ClusterName field in the data file.
For "By Reason", the group name is a subfield of the PendReason field in the data file.
The group code is obtained by looking up the group name and matching it to the dimension_name in the wi_dimensioncode table and generating a new code if it doesn't exist.
Primary key
REASON_TYPE_CODE
The is the reason type code. The source is the PendReasonType field in the data file or each pending reason subrecord of the PendReason field in the data file.
The reason type code is obtained by looking up the reason type or pending reasons and matching them to the dimension_name in the wi_dimensioncode table and generating a new code if they don't exist.
Primary key
REASON_CODE
This is the reason code. The source is each pending reason subrecord of the PendReason field in the data file
The reason code is obtained by looking up the pending reasons and matching them to the dimension_name in the wi_dimensioncode table and generating a new code if it doesn't exist.
Primary key
MAX_JOB_NUM
This is the maximum number of jobs. The source is each pending reason subrecord of the PendReason field in the data file.
The maximum number of jobs is the maximum number of distinct job IDs grouped by the same ClusterName, PendReasonType, PendReason, HostType, TimeDiff, and CoreHour fields across the same interval within the hour.

MIN_JOB_NUM
This is the minimum number of jobs. The source is each pending reason subrecord of the PendReason field in the data file.
The minimum number of jobs is the minimum number of distinct job IDs grouped by the same ClusterName, PendReasonType, PendReason, HostType, TimeDiff, and CoreHour fields across the same interval within the hour.

TOT_JOB_NUM
This is the total number of jobs. The source is each pending reason subrecord of the PendReason field in the data file.
The total number of jobs is the total number of distinct job IDs grouped by the same ClusterName, PendReasonType, PendReason, HostType, TimeDiff, and CoreHour fields across all intervals within the hour.

LOCAL_SERVERTIME
The is the time stamp of the record localized according to the local server.

COUNTER
This is the number of records being collected into this record.

INSERT_SEQ
This is a unique sequence number generated by the system for each new record inserted.