Platform Analytics 7 Dataflow

WI_JOBRESUSAGECOST table

This describes the wi_jobresusagecost table and how each column of data arises from LSB_EVENTS table.
The wi_jobresusagecost get its data from Data Collection Table LSB_EVENTS, and the event type is 'JOB_FINISH'.
For each record we get from LSB_EVENTS which event type is 'JOB_FINISH'. Each of this record will then be going through an interval aggregation grouped by "QueueTime", "JobID", "JobArrayIndex", "Cluster", "User", "Project", "Resource" and "StartTime" to stamp out any duplicated records and add the resource usage for the same kind together. Then, we inserted the records into this table.
This is the column description of each data column of WI_JOBRESUSAGECOST and how each column is filled with data.
Column Name
Description
Key
CLUSTER_CODE
This comes from the CLUSTER_NAME field in the raw table. Once we get the cluster_name, we then look it up in the wi_clustercode table to see if we have already has a record of it in there. If we do, then we'll get the code back, otherwise, we will insert it into the wi_clustercode table and generate the code. The code itself is a positive integer and each new code is equal to the maximum of the sequence number +1.
Primary key
QUEUE_TIME
This comes from the SUBMIT_TIME field in the raw table. It is in GMT timezone.
Primary key
JOB_ID
This comes from the JOB_ID field in the raw table.
Primary key
JOB_ARRAY_INDEX
This comes from the JOB_ARRAY_INDEX field in the raw table.
Primary key
FINISH_TIME
This comes from the END_TIME field from the raw table. It is in GMT timezone.
Primary key
USER_CODE

This comes from the USER_NAME field in the raw table.

Once we get the user name, we then look it up in the wi_usercode table to see if we have already has a record of it in there. If we do, then we'll get the code back, otherwise, we will insert it into the wi_usercode table and generate the code. The code itself is a positive integer and each new code is equal to the maximum of the sequence number +1.

PROJECT_CODE

This comes from the PROJECT_NAME field in the raw table. If this is null, then set this field to "-".

Once we get the host name, we then look it up in the wi_projectcode table to see if we have already has a record of it in there. If we do, then we'll get the code back, otherwise, we will insert it into the wi_projectcode table and generate the code. The code itself is a positive integer and each new code is equal to the maximum of the sequence number +1.

RESOURCE_CODE

This comes from the RES_REQ field in the raw table. If this is null, then we will filter out this record.

Once we get the resource name, we then look it up in the wi_resourcecode table to see if we have already has a record of it in there. If we do, then we'll get the code back, otherwise, we will insert it into the wi_usercode table and generate the code. The code itself is a positive integer and each new code is equal to the maximum of the sequence number +1.
Primary key
RESERVE_VALUE
This is the total resource used for this resource_name under this job.

RES_USAGE
The usage is calculated as reserve_value * (finish_time-start_time)/60. It is in minutes.

RES_COST
Set to 0.0

START_TIME
This comes from the " Start_time " field in the raw table. It is in GMT timezone.

INSERT_SEQ
This is a system generated sequence number. For each new record inserted, an unique sequence number is being assigned to this column.