datasetName / datasetId | datasetStatus | readConnector | writeConnector | errorConnector | computeEngine | manifestFile | datasetProgress | executionLogs |
---|
S3 Artifact File Access Grant Role ARN
Select Compute Engine Type
Select DeleteType
Select Tasks to Redrive
Select Task Redrive Policy
Move stopped tasks to:
taskId | taskStatus | taskProgress | taskCheckpoint | taskExecutionLogs | readerDefinition | computeDefinition | JSON |
---|
• Each datapoint is the percentage of tasks that completed successfully at that minute, plotted on the left axis. 1.0 means 100% tasks succeeded, 0.0 means all tasks that completed at that minute failed.
• Plotted on the right axis is the number of tasks that completed at that minute.
• For example, % of 0.75 and number of tasks = 4 means 75% of the 4 completing tasks succeeded at that minute and 25% of the 4 tasks failed.
• Each datapoint is the average latency of completion for the tasks that completed at that minute.
• For example, a value of 196,926.0 milliseconds means that the tasks completing at that minute took 196 seconds to complete on average.
• This metric can be correlated to the number of tasks metric for that minute in the previous graph.
• For example, if 4 tasks completed at that minute and the average latency is 196,926.0 at that minute, then the sample size for the average is the 4 tasks that completed that minute.
• Each datapoint is the percentage of task checkpoints that completed successfully at that minute, plotted on the left axis.
• 1.0 means 100% checkpoints succeeded, 0.0 means all checkpoints at that minute failed.
• Plotted on the right axis is the average latency of each checkpoint in milliseconds.
• A task checkpoints after reading, for example, every 500 records from the file. In a minute, the task can create multiple checkpoints. And there are multiple tasks running concurrently. This is the average across all these checkpoints.
• While this isn't controllable by the user, knowing how much time each checkpoint is taking can help with diagnosing task issues. The #Let's Data team closely monitors this metric to find any DB throttling / DB latency issues. We expect this metric to be within the ~25 msec range.
• Each datapoint is the sum of the number of records processed (red), skipped (green) and errored (blue) by each task at that minute, plotted on the left axis.
• This record is a composite record which is being sent to the write destination - not necessarily the records read from the file.
• For example, if each task is reading data from 2 files, and it produces a composite document (or decision to skip / error) from multiple records from each file, the metric will count these as 1.
• Each datapoint is the sum of the Write Connector's Put API call by each task at that minute, plotted on the left axis.
• We batch the Write Connector Put API calls, so each call may have multiple records and may succeed / fail partially (which would be retried till completion or task error).
• A task calls Write Connector Put API when the batch size records are in the buffer or the buffer size in bytes reaches max allowed threshold.
• In a minute, the task can call Write Connector Put API multiple times and there are multiple tasks running concurrently. This is a sum across all these Write Connector Put API calls.
• User can use the volume, latency (and the Write Connector Bytes Written and the Put Retries) graphs to determine whether the Write Connector scaling needs fine-tuning.
• Each datapoint is the average percentage of the Write Connector Put API calls that were retried by the tasks.
• A 0.0 value means that there were no retries to Write Connector (Write Connector is adequately scaled and one can even see if descaling a little would cause any issues or not).
• In a minute, the task can call Write Connector Put API multiple times and there are multiple tasks running concurrently. This is the average retry percentage across all these task Put API calls.
• A value of 0.25 means that 25% of the Write Connector calls were retried across the tasks on average. If there are 4 tasks concurrently running, then this is the average of retry percentage for each task. For example, either each task could be retrying 25% of the time before the call succeeds (possible Write Connector scaling issues) or maybe 1 task is retrying 100% of the time and 3 tasks are retrying 0.0% (possible some issues with the task - this is contrived example and is unlikely).
• User can use the Put Retries (and the Write Connector Bytes Written and the volume, latency) graphs to determine whether the Write Connector stream scaling needs fine-tuning.
• Each datapoint plotted on the left axis is the (avg, min and max) latency of extraction of the record by the user handlers (readers and parser).
• Plotted on the right axis is the sample count for the latency metric.
• This is pure CPU work that the parsers and readers do on the bytes from the file to extract the records and create composite records (plus some work that the system does to put them into buffers etc).
• This may be a good metric to look at to find performance issues with the parser, we expect these latency to be < 10 ms (min, avg) and < 30 ms (max for example for large documents).
• Each datapoint plotted on the left axis is the bytes read by the readers from S3 (in KBs).
• In a minute, the readers can read from S3 many times and there could be multiple readers in a task (one per file type). There could be many tasks running concurrently. This is the average KBs read by all readers across all concurrently running tasks.
• Each datapoint plotted on the right axis is the bytes written by the task to Write Connector (in KBs).
• In a minute, the tasks can write to Write Connector multiple times. There could be many tasks running concurrently. This is the average KBs written per min across all concurrently running tasks.
• Users can look at these metrics to reason about the system's throughput and debug any issues that may arise from network read / write.
• We expect max / avg network throughput of XYZ /ABC from S3 file read. We expect max / avg network throughput of XYZ / ABC to Write Connector for each shard (multiply by number of shards).
• Each datapoint on the left axis (red) is the average latency of reading each individual record from the read destination at that minute.
• For example, a value of 196.0 milliseconds means that the tasks took 196 ms to read the next record from read destination on average at that minute.
• Each datapoint on the right axis (green) is the average of the number of records that were read by the tasks at that minute from the read destination.
• For example, if number of messages is 5000, then tasks read 5000 records on average at that minute from the read destination.
• Each datapoint is the average enqueue time in milliseconds that the readers (tasks) took to enqueue to the compute / writer queue.
• This can be used to diagnose slower downstream components (compute, writers), which can cause the reader queue to be full waiting for space to become available in the queue.
• Each datapoint is the average wait time in milliseconds for a record in the writer queue. This can be used to diagnose the throughput issues.
• For example, a large wait time might require increasing write resources to increase the write throughput.
• Each datapoint is the average dequeue time in milliseconds to dequeue a message from the writer queue. A high dequeue value means that the writer is waiting on an empty queue.
• This is to detect issues where compute may not be enqueuing messages fast enough in cases where compute endpoints are not adequately scaled.
• Each datapoint is the average time writers took to pre-process the message. This could be deserialization, error doc creation or no preprocessing
• A high value means that writers are doing additional processing prior to writing to the destination which could be an issue
• There isn't much preprocessing done by the writers, so we expect this to be less than 5 millisecs
• Each datapoint plotted on the left axis is the bytes read by the readers from S3 (in KBs).
• In a minute, the readers can read from S3 many times and there could be multiple readers in a task (one per file type). There could be many tasks running concurrently. This is the average KBs read by all readers across all concurrently running tasks.
• Each datapoint plotted on the right axis is the bytes written by the task to Write Connector (in KBs).
• In a minute, the tasks can write to Write Connector multiple times. There could be many tasks running concurrently. This is the average KBs written per min across all concurrently running tasks.
• Users can look at these metrics to reason about the system's throughput and debug any issues that may arise from network read / write.
• We expect max / avg network throughput of XYZ /ABC from S3 file read. We expect max / avg network throughput of XYZ / ABC to Write Connector for each shard (multiply by number of shards).
• Each datapoint on the left axis (red) is the average latency of the write for each individual record at that minute. For destinations that support batched writes, this is the latency of the batched call whereas for multi-threaded batching by Lets Data, this is the latency of each message's write call.
• For example, a value of 196.0 milliseconds means that the writers took 196 ms to write each record on average at that minute.
• Each datapoint on the right axis (green) is the average of the number of records that were written by the writers at that minute.
• For example, if number of messages is 5000, then writers processed 5000 records on average at that minute for the write destination.
• Each datapoint on the left axis (red) is the average total latency for each individual record at that minute and includes read, queueing, compute and write times.
• For example, a value of 196.0 milliseconds means that the record took 196 ms total time on average at that minute to be processed end to end by the task.
• Each datapoint on the right axis (green) is the average of the number of records that were processed by the task at that minute.
• For example, if number of messages is 5000, then task processed 5000 records on average at that minute.
taskId | taskStatus | logs |
---|
taskId | taskStatus | numberOfErrors | errors |
---|
resourceType | resource | eventStartTime | eventEndTime | meteringDimension | meteringUnit | meteringValue | billedStatus |
---|
fullName | emailAddress | phone | userRole | userStatus | JSON |
---|
Full Name
Email Address
Phone
User Role
Email Address
Attribute To Update
Attribute Existing Value
Attribute New Value
Email Address
ccBrand | ccLastFour | ccExpiry | paymentMethodType |
---|
Price Name | Product | Description | Unit Amount |
---|
Start Time | End Time | Status | Due Date | Currency | Tax | Total | Amount Due | Amount Paid | Charge | Pay Now | Invoice Line Items |
---|
Id Token
Access Token
User Profile