Use AQUA with Amazon Redshift RA3.xlplus nodes

0
68


Amazon Redshift RA3 is the newest technology node kind that permits you to scale compute and storage on your information warehouses independently. The RA3 node household consists of RA3.16xlarge, RA3.4xlarge, and RA3.xlplus nodes for big, medium, and small workloads, respectively. RA3.xlplus, the newest member of the RA3 node household, gives one third of the computing energy of RA3.4xlarge and prices one third of the worth. RA3.xlplus is the smallest node within the RA3 household, but it surely gives the identical superior functionalities. It has been broadly utilized in environments with gentle computing demand similar to QA, information analytics for small groups, or processing smaller datasets.

In 2021, Amazon Redshift launched AQUA (Superior Question Accelerator) for Amazon Redshift to spice up efficiency of analytical queries that scan, filter, and mixture massive datasets. AQUA makes use of AWS-designed processors with the AWS Nitro chip adapter to hurry up information encryption and compression, and customized analytical processors carried out in FPGAs to speed up purposes requiring textual content search of a really massive dataset, similar to advertising and marketing and personalization.

Clients have requested us to help AQUA for RA3.xlplus, and we lately launched AQUA for RA3.xlplus nodes. On this put up, we proceed to construct on the put up AQUA (Superior Question Accelerator) – A Velocity Increase for Your Amazon Redshift Queries and present that with AQUA help, RA3.xlplus supplies the identical profit as the prevailing supported RA3 nodes within the following areas:

  • Mechanically boosting sure sorts of queries
  • Decreasing the affect in your Amazon Redshift cluster by offloading sure queries that scan, filter, and mixture massive datasets to AQUA

Take a look at atmosphere

To check AQUA for RA3.xlplus, we began by creating an RA3.xlplus cluster with the next particulars:

  • Amazon Redshift cluster – 2-node RA3.xlplus
  • Dataset – 3 TB TPC-DS, 3 TB TPC-H
  • Question set – Pattern queries based mostly on the TPC-H and TPC-DS workload

Pattern queries

To check AQUA, we created six textual content search queries that scan, filter, and mixture the lineitem desk within the TPC-H dataset, which has 18 billion rows with a WHERE clause predicate towards the l_comment column.

The next desk summarizes our desk definition.

desk encoded diststyle sortkey1 rows
lineitem Y KEY l_shipdate 18,000,048,306

We randomly generated a question set with queries of assorted complexity. The queries are designed to measure scan value, that are an space of focus for AQUA. Every question has a predicate with LIKE and OR. The variety of LIKE or OR predicates will get progressively larger to simulate advanced workloads.

For instance, Question 1 has one OR predicate:

SELECT COUNT(l_orderkey)
FROM lineitem
WHERE (l_comment LIKE '%throughout%') OR (l_comment LIKE '%courageous,%');

In distinction, Question 4 has 50 OR predicates:

SELECT COUNT(l_orderkey)
  FROM lineitem
  WHERE (l_comment LIKE '%outsi%') OR
  (l_comment LIKE '%uthless%') OR
  (l_comment LIKE '%capades%') OR
  (l_comment LIKE '%horses%') OR
  (l_comment LIKE '%ornis%' AND l_comment LIKE '%phins?%') OR
  (l_comment LIKE '%affix%') OR
  (l_comment LIKE '%integrat%') OR
....
  (l_comment LIKE '%ithin%' AND l_comment LIKE '%quiet%') OR
  (l_comment LIKE '%taphs%') OR
  (l_comment LIKE '%dugouts%' AND l_comment LIKE '%ches%') OR
  (l_comment LIKE '%telets%' AND l_comment LIKE '%detect!%') OR
  (l_comment LIKE '%develop%') OR
  (l_comment LIKE '%promise!%') OR
  (l_comment LIKE '%was%') OR
  (l_comment LIKE '%accounts%') OR
  (l_comment LIKE '%idly%' AND l_comment LIKE '%deposits%') OR
  (l_comment LIKE '%combine!%' AND l_comment LIKE '%rely%') OR
  (l_comment LIKE '%ins%' AND l_comment LIKE '%makes use of!%') OR
  (l_comment LIKE '%epitaphs!%' AND l_comment LIKE '%breac%') OR
  (l_comment LIKE '%pliers%' AND l_comment LIKE '%phins%') OR
  (l_comment LIKE '%hogs%' AND l_comment LIKE '%sentiments%') OR
  (l_comment LIKE '%ctions%' AND l_comment LIKE '%daringly%') OR
  (l_comment LIKE '%ies%' AND l_comment LIKE '%esias%');

The next desk summarizes the complexity of every question.

Question Quantity Variety of OR Variety of LIKE
Question 1 1 2
Question 2 5 7
Question 3 10 12
Question 4 50 66

Scan efficiency enchancment with AQUA

We ran the 4 queries sequentially with out another workload on the system. With AQUA, the efficiency enhancements vary from roughly 7–13 instances sooner, as summarized within the following desk.

Question Quantity Amazon Redshift with AQUA (seconds) Amazon Redshift Solely (seconds) Enchancment
Question 1 78.53 635.89 709.74%
Question 2 92.75 810.04 773.36%
Question 3 130.68 956.83 632.19%
Question 4 137.68 1950.9 1316.98%

AQUA affect on a number of workloads

On this atmosphere, we simulated a multi-user workflow utilizing TPC-DS queries on the Amazon Redshift cluster. We recorded question runtime for 3 situations:

  • Baseline – We measured the end-to-end runtime operating all TPC-DS queries serially on the Amazon Redshift cluster. On this situation, AQUA was off and no extra workload was run (a single person was on the cluster).
  • Baseline with extra workload – This was the identical because the baseline situation with an extra workload run in parallel. We simulated a person load by operating textual content scan queries randomly chosen from Question 1, Question 2 and Question 3. These queries have comparatively quick runtimes. We had two variations of this situation:
    • AQUA turned off
    • AQUA turned on

From the outcomes, we noticed the next:

  • With AQUA turned on for all workloads, the affect of a textual content scan question on the baseline runtime was negligible.
  • With out AQUA, the baseline runtime was impacted by the extra workload created with textual content scan queries. In our case, overhead was about 31%.
Baseline Baseline with extra workload Enchancment with AQUA
AQUA turned off AQUA turned on
TPC-DS Finish-to-Finish Time 3:43:35 4:54:50 3:44:36 31.27%

Single-node RA3.xlplus help

AQUA additionally helps the lately launched Amazon Redshift single-node RA3.xlplus. In a single-node configuration, the useful resource is shared amongst all Amazon Redshift operations, that are historically dealt with individually by a pacesetter node and compute nodes. A single-node configuration is often utilized in a private or small group atmosphere for information exploration.

We ran the identical set of queries as earlier than utilizing Question 1, 2 and Question 3. The outcomes demonstrated that AQUA supplies an analogous degree of accelerations for these queries in a single-node atmosphere.

Question Quantity Amazon Redshift with AQUA (seconds) Amazon Redshift Solely (seconds) Enchancment
Question 1 157.91 1,254.03 694.13%
Question 2 193.64 2,037.79 952.36%
Question 3 260.75 2,495.85 857.19%

Abstract

On this put up, we ran a set of simulated efficiency exams on the Amazon Redshift RA3.xlplus platform with AQUA. With AQUA on, RA3.xlplus supplies the identical profit as earlier supported platforms. It supplies a question scan efficiency enhance with AQUA-supported operators, which is able to develop over time. It could possibly cut back the efficiency affect of your current workflow by offloading the scan to AQUA.

We invite you to share your feedback and use circumstances with the Amazon Redshift AQUA group.

For extra details about how AQUA accelerates Amazon Redshift, see AQUA (Superior Question Accelerator) for Amazon Redshift.

For extra details about queries accelerated by AQUA, see When does Amazon Redshift use AQUA to run queries?


In regards to the Authors

Quan Li is a Senior Database Engineer at Amazon Redshift. His focus is enabling prospects to ship most enterprise worth. Quan is keen about optimizing high-performance analytical databases. Throughout his spare time, he enjoys touring and experiencing several types of cuisines together with his household.

Steffen Rochel is a Sr. Software program Growth Supervisor at AWS. He’s targeted on information analytics acceleration. He has experience in hardware-software design and operation of large-scale, high-performance distributed programs.