Finest practices to optimize your Amazon Redshift and MicroStrategy deployment


It is a visitor weblog publish co-written by Amit Nayak at Microstrategy. In their very own phrases, “MicroStrategy is the most important impartial publicly traded enterprise intelligence (BI) firm, with the main enterprise analytics platform. Our imaginative and prescient is to allow Intelligence In all places. MicroStrategy offers fashionable analytics on an open, complete enterprise platform utilized by lots of the world’s most admired manufacturers within the Fortune World 500. Optimized for cloud and on-premises deployments, the platform options HyperIntelligence, a breakthrough know-how that overlays actionable enterprise knowledge on well-liked enterprise functions to assist customers make smarter, sooner selections.”

Amazon Redshift is a quick, absolutely managed, petabyte-scale knowledge warehouse. It offers a easy and cost-effective technique to analyze all of your knowledge utilizing your present BI instruments. Amazon Redshift delivers quick question efficiency through the use of columnar storage know-how to enhance I/O effectivity and parallelizing queries throughout a number of nodes. Amazon Redshift has customized JDBC and ODBC drivers that you could obtain from the Join Shopper tab on the Amazon Redshift console, permitting you to make use of a variety of acquainted BI instruments.

When utilizing your MicroStrategy utility with Amazon Redshift, it’s essential to grasp the way to optimize Amazon Redshift to get the perfect efficiency to fulfill your workload SLAs.

On this publish, we have a look at the perfect practices for optimized deployment of MicroStrategy utilizing Amazon Redshift.

Optimize Amazon Redshift

On this part, we talk about methods to optimize Amazon Redshift.

Amazon Redshift RA3 cases

RA3 nodes with managed storage assist you optimize your knowledge warehouse by scaling and paying for the compute capability and managed storage independently. With RA3 cases, you may select the variety of nodes primarily based in your efficiency necessities, and solely pay for the managed storage that you simply use. Dimension your RA3 cluster primarily based on the quantity of information you course of day by day with out growing your storage prices.

For extra particulars about RA3 options, see Amazon Redshift RA3 cases with managed storage.

Distribution kinds

While you load knowledge right into a desk, Amazon Redshift distributes the rows of the desk to every of the compute nodes in accordance with the desk’s distribution type. While you run a question, the question optimizer redistributes the rows to the compute nodes as wanted to carry out any joins and aggregations. The objective in selecting a desk distribution type is to attenuate the influence of the redistribution step by finding the information the place it must be earlier than the question is run.

While you create a desk, you may designate one in all 4 distribution kinds: AUTO, EVEN, KEY, or ALL. Should you don’t specify a distribution type, Amazon Redshift makes use of AUTO distribution. With AUTO distribution, Amazon Redshift assigns an optimum distribution type primarily based on the dimensions of the desk knowledge. You should utilize computerized desk optimization to get began with Amazon Redshift simply or optimize manufacturing workloads whereas reducing the executive effort required to get the very best efficiency.

MicroStrategy, like every SQL utility, transparently takes benefit of the distribution type outlined on base tables. MicroStrategy recommends following Amazon Redshift beneficial greatest practices when implementing the bodily schema of the bottom tables.

Kind keys

Defining a desk with a type key leads to the bodily ordering of information within the Amazon Redshift cluster nodes primarily based on the type sort and the columns chosen in the important thing definition. Sorting allows environment friendly dealing with of range-restricted predicates to scan the minimal variety of blocks on disk to fulfill a question. A contrived instance could be having an orders desk with 5 years of information with a SORTKEY on the order_date column. Now suppose a question on the orders desk specifies a date vary of 1 month on the order_date column. On this case, you may get rid of as much as 98% of the disk blocks from the scan. If the information isn’t sorted, extra of the disk blocks (presumably all of them) need to be scanned, ensuing within the question working longer.

We advocate creating your tables with SORTKEY AUTO. This manner, Amazon Redshift makes use of computerized desk optimization to decide on the type key. If Amazon Redshift determines that making use of a SORTKEY improves cluster efficiency, tables are routinely altered inside hours from the time the cluster was created, with minimal influence to queries.

We additionally advocate utilizing the type key on columns typically used within the WHERE clause of the report queries. Take into account that SQL features (reminiscent of knowledge transformation features) utilized to type key columns in queries scale back the effectiveness of the type key for these queries. As an alternative, just be sure you apply the features to the in contrast values in order that the type key’s used. That is generally discovered on DATE columns which might be used as type keys.

Amazon Redshift Advisor offers suggestions that will help you enhance the efficiency and reduce the working prices in your Amazon Redshift cluster. The Advisor analyzes your cluster’s workload to establish essentially the most acceptable distribution key and kind key primarily based on the question patterns of your cluster.


Compression settings also can play an enormous position in terms of question efficiency in Amazon Redshift. Compression conserves cupboard space and reduces the dimensions of information that’s learn from storage, which reduces the quantity of disk I/O and subsequently improves question efficiency.

By default, Amazon Redshift routinely manages compression encoding for all columns in a desk. You’ll be able to specify the ENCODE AUTO choice for the desk to allow Amazon Redshift to routinely handle compression encoding for all columns in a desk. You’ll be able to alternatively apply a selected compression sort to the columns in a desk manually while you create the desk, or you should utilize the COPY command to research and apply compression routinely.

We don’t advocate compressing the primary column in a compound type key as a result of it would end in scanning extra rows than anticipated.

Amazon Redshift materialized views

Materialized views can considerably enhance question efficiency for repeated and predictable analytical workloads reminiscent of dashboarding, queries from BI instruments, and extract, load, and remodel (ELT) knowledge processing.

Materialized views are particularly helpful for queries which might be predictable and repeated time and again. As an alternative of performing resource-intensive queries on giant tables, functions can question the pre-computed knowledge saved within the materialized view.

For instance, take into account the state of affairs the place a set of queries is used to populate a group of charts for a dashboard. This use case is good for a materialized view, as a result of the queries are predictable and repeated again and again. Each time a change happens within the base tables (knowledge is inserted, deleted, or up to date), the materialized views might be routinely or manually refreshed to symbolize the present knowledge.

Amazon Redshift can routinely refresh materialized views with up-to-date knowledge from its base tables when materialized views are created with or altered to have the auto-refresh choice. Amazon Redshift auto-refreshes materialized views as quickly as potential after base tables modifications.

To replace the information in a materialized view manually, you should utilize the REFRESH MATERIALIZED VIEW assertion at any time. There are two methods for refreshing a materialized view:

  • Incremental refresh – In an incremental refresh, it identifies the modifications to the information within the base tables because the final refresh and updates the information within the materialized view
  • Full refresh – If incremental refresh isn’t potential, Amazon Redshift performs a full refresh, which reruns the underlying SQL assertion, changing all the information within the materialized view

Amazon Redshift routinely chooses the refresh methodology for a materialized view relying on the SELECT question used to outline the materialized view. For details about the restrictions for incremental refresh, see Limitations for incremental refresh.

The next are a number of the key benefits utilizing materialized views:

  • You’ll be able to velocity up queries by pre-computing the outcomes of advanced queries, together with a number of base tables, predicates, joins, and aggregates
  • You’ll be able to simplify and speed up ETL and BI pipelines
  • Materialized views assist Amazon Redshift native, Amazon Redshift Spectrum, and federated queries
  • Amazon Redshift can use computerized question rewrites of materialized views

For instance, let’s take into account the gross sales staff needs to construct a report that exhibits
the product gross sales throughout completely different shops. This dashboard question relies out of a 3 TB Cloud DW benchmark dataset primarily based on the TPC-DS benchmark dataset.

On this first step, you create a daily view. See the next code:

create view vw_product_sales
	sum(ss_sales_price) as total_sales_price,
	sum(ss_net_profit) as total_net_profit,
	sum(ss_quantity) as total_quantity
store_sales ss, merchandise i, date_dim d, retailer s
the place ss.ss_item_sk=i.i_item_sk
and ss.ss_store_sk = s.s_store_sk
and ss.ss_sold_date_sk=d.d_date_sk
and d_year = 2000
group by i_brand,

The next code is a report to research the product gross sales by class:

    sum(total_quantity) as total_quantity
FROM vw_product_sales
ORDER BY 3 desc

The previous experiences take roughly 15 seconds to run. As extra merchandise are offered, this elapsed time regularly will get longer. To hurry up these experiences, you may create a materialized view to precompute the full gross sales per class. See the next code:

create materialized view mv_product_sales
	sum(ss_sales_price) as total_sales_price,
	sum(ss_net_profit) as total_net_profit,
	sum(ss_quantity) as total_quantity
store_sales ss, merchandise i, date_dim d, retailer s
the place ss.ss_item_sk=i.i_item_sk
and ss.ss_store_sk = s.s_store_sk
and ss.ss_sold_date_sk=d.d_date_sk
and d_year = 2000
group by i_brand,

The next code analyzes the product gross sales by class in opposition to the materialized view:

    sum(total_quantity) as total_quantity
FROM mv_product_sales
ORDER BY 3 desc;

The identical experiences in opposition to a materialized view took round 4 seconds as a result of the brand new queries entry precomputed joins, filters, grouping, and partial sums as an alternative of the a number of, bigger base tables.

Workload administration

Amazon Redshift workload administration (WLM) allows you to flexibly handle priorities inside workloads in order that brief, fast-running queries don’t get caught in queues behind long-running queries. You should utilize WLM to outline a number of question queues and route queries to the suitable queues at runtime.

You’ll be able to question WLM in two modes:

  • Automated WLM – Amazon Redshift manages the sources required to run queries. Amazon Redshift determines what number of queries run concurrently and the way a lot reminiscence is allotted to every dispatched question. Amazon Redshift makes use of extremely educated subtle ML algorithms to make these selections.
  • Question precedence is a function of computerized WLM that allows you to assign precedence ranks to completely different person teams or question teams, to make sure that higher-priority workloads get extra sources for constant question efficiency, even throughout busy occasions. For instance, take into account a essential dashboard report question that has larger precedence than an ETL job. You’ll be able to assign the precedence as highest for the report question and excessive precedence to the ETL question.
  • No queries are ever starved of sources, and decrease precedence queries all the time full, however may take longer to finish.
  • Guide WLM – With guide WLM, you may handle the system efficiency by modifying the WLM configuration to create separate queues for long-running queries and short-running queries. You’ll be able to outline as much as eight queues to separate workloads from one another. Every queue accommodates various question slots, and every queue is related to a portion of accessible reminiscence.

It’s also possible to use the Amazon Redshift question monitoring guidelines (QMR) function to set metrics-based efficiency boundaries for workload administration (WLM) queues, and specify what motion to take when a question goes past these boundaries. For instance, for a queue that’s devoted to short-running queries, you would possibly create a rule that cancels queries that run for greater than 60 seconds. To trace poorly designed queries, you may need one other rule that logs queries that include nested loops. You should utilize predefined rule templates in Amazon Redshift to get began with QMR.

We advocate the next configuration for WLM:

  • Allow computerized WLM
  • Allow concurrency scaling to deal with a rise in concurrent learn queries, with constant quick question efficiency
  • Create QMR guidelines to trace and deal with poorly written queries

After you create and configure completely different WLM queues, you should utilize a MicroStrategy question label to set the Amazon Redshift question group for queue project. This tells Amazon Redshift which WLM queue to ship the question to.

You’ll be able to set the next as a report pre-statement in MicroStrategy:

set query_group to 'mstr_dashboard';

You should utilize MicroStrategy question labels to establish the MicroStrategy submitted SQL statements inside Amazon Redshift system tables.

You should utilize it with all SQL assertion varieties; subsequently, we advocate utilizing it for multi-pass SQL experiences. When the label of a question is saved within the system view stl_query, it’s truncated to fifteen characters (30 characters are saved in all different system tables). Because of this, you have to be cautious when selecting the worth for question label.

You’ll be able to set the next as a report pre-statement:

set query_group to 'MSTR=!o;Undertaking=!p;Consumer=!u;Job=!j;'

This collects info on the server aspect about variables like venture identify, report identify, person, and extra.

To wash up the question group and launch sources, use the cleanup post-statement:

MicroStrategy permits using wildcards which might be changed by values retrieved at a report’s run time, as proven within the pre- and post-statements. The next desk offers an instance of pre- and post-statements.

VLDB Class VLDB Property Setting Worth Instance
Pre/Put up Statements Report Pre-statement set query_group to 'MSTR=!o;Undertaking=!p;Consumer=!u;Job=!j;'
Pre/Put up Statements Cleanup Put up-statement reset query_group;

For instance, see the next code:

VLDB Property Report Pre Assertion = set query_group to 'MSTRReport=!o;'
set query_group to 'MSTRReport=Price, Worth, and Revenue per Unit;'

Question prioritization in MicroStrategy

Generally, you might have a number of functions submitting queries to Amazon Redshift along with MicroStrategy. You should utilize Amazon Redshift question teams to establish MicroStrategy submitted SQL to Amazon Redshift, together with its project to the suitable Amazon Redshift WLM queue.

The Amazon Redshift question group for a MicroStrategy report is ready and reset by way of using the next report-level MicroStrategy VLDB properties.

VLDB Class VLDB Property Setting Worth Instance
Pre/Put up Statements Report Pre-statement set query_group to 'MSTR_High=!o;'
Pre/Put up Statements Cleanup Put up-statement reset query_group;

A MicroStrategy report job can submit a number of queries to Amazon Redshift. In such instances, all queries for a MicroStrategy report are labeled with the identical question group and subsequently are assigned to identical queue in Amazon Redshift.

The next is an instance implementation of MicroStrategy Amazon Redshift WLM:

  • Excessive-priority MicroStrategy experiences are set with report pre-statement MSTR_HIGH=!o;, medium precedence experiences with MSTR_MEDIUM=!o;, and low precedence experiences with MSTR_LOW=!o;.
  • Amazon Redshift WLM queues are created and related to corresponding question teams. For instance, the MSTR_HIGH_QUEUE queue is related to the MSTR_HIGH=*; question group (the place * is an Amazon Redshift wildcard).

Concurrency scaling

With concurrency scaling, you may configure Amazon Redshift to deal with spikes in workloads whereas sustaining constant SLAs by elastically scaling the underlying sources as wanted. When concurrency scaling is enabled, Amazon Redshift repeatedly displays the designated workload. If the queries begin to get backlogged due to bursts of person exercise, Amazon Redshift routinely provides transient cluster capability and routes the requests to those new clusters. You handle which queries are despatched to the concurrency scaling cluster by configuring the WLM queues. This occurs transparently in a matter of seconds, so your queries proceed to be served with low latency. As well as, each 24 hours that the Amazon Redshift important cluster is in use, you accrue a 1-hour credit score in the direction of utilizing concurrency scaling. This permits 97% of Amazon Redshift prospects to profit from concurrency scaling at no further cost.

For extra particulars on concurrency scaling pricing, see Amazon Redshift pricing.

Amazon Redshift removes the extra transient capability routinely when exercise reduces on the cluster. You’ll be able to allow concurrency scaling for the MicroStrategy report queue and in case of heavy load on the cluster, the queries run on a concurrent cluster, thereby enhancing the general dashboard efficiency and sustaining a constant person expertise.

To make concurrency scaling work with MicroStrategy, use derived tables as an alternative of momentary tables, which you are able to do by setting the VLDB property Intermediate desk sort to Derived desk.

Within the following instance, we allow concurrency scaling on the Amazon Redshift cluster for the MicroStrategy dashboard queries. We create a person group in Amazon Redshift, and all of the dashboard queries are allotted to this person group’s queue. With concurrency scaling in place for the report queries, we are able to see a big discount in question wait time.

For this instance, we created one WLM queue to run our dashboard queries with highest precedence and one other ETL queue with excessive precedence. Concurrency scaling is turned on for the dashboard queue, as proven within the following screenshot.

As a part of this take a look at, we ran a number of queries in parallel on the cluster, a few of that are ETL jobs (insert, delete, replace, and duplicate), and a few are advanced choose queries, reminiscent of dashboard queries. The next graph illustrates what number of queries are ready within the WLM queues and the way concurrency scaling helps to deal with these queries.

Within the previous graph, a number of queries are ready within the WLM queues; concurrency scaling routinely begins in seconds to course of queries with none delays, as proven within the following graph.

This instance has demonstrated how concurrency scaling helps deal with spikes in person workloads by including transient clusters as wanted to supply constant efficiency even because the workload grows to tons of of concurrent queries.

Amazon Redshift federated queries

Prospects utilizing MicroStrategy typically join varied relational knowledge sources to a single MicroStrategy venture for reporting and evaluation functions. For instance, you would possibly combine an operational (OLTP) knowledge supply (reminiscent of Amazon Aurora PostgreSQL) and knowledge warehouse knowledge to get significant insights into your online business.

With federated queries in Amazon Redshift, you may question and analyze knowledge throughout operational databases, knowledge warehouses, and knowledge lakes. The federated question function lets you combine queries from Amazon Redshift on stay knowledge in exterior databases with queries throughout your Amazon Redshift and Amazon Easy Storage Service (Amazon S3) environments.

Federated queries assist incorporate stay knowledge as a part of your MicroStrategy reporting and evaluation, with out the necessity to hook up with a number of relational knowledge sources from MicroStrategy.

It’s also possible to use federated queries to MySQL.

This simplifies the multi-source experiences use case by being able to run queries on each operational and analytical knowledge sources, with out the necessity to explicitly join and import knowledge from completely different knowledge sources inside MicroStrategy.

Redshift Spectrum

The MicroStrategy Amazon Redshift connector consists of assist for Redshift Spectrum, so you may join instantly to question knowledge in Amazon Redshift and analyze it at the side of knowledge in Amazon S3.

With Redshift Spectrum, you may effectively question and retrieve structured and semi-structured knowledge (reminiscent of PARQUET, JSON, and CSV) from information in Amazon S3 with out having to load the information into Amazon Redshift tables. It permits prospects with giant datasets saved in Amazon S3 to question that knowledge from throughout the Amazon Redshift cluster utilizing Amazon Redshift SQL queries with no knowledge motion—you pay just for the information you scanned. Redshift Spectrum additionally permits a number of Amazon Redshift clusters to concurrently question the identical dataset in Amazon S3 with out the necessity to make copies of the information for every cluster. Primarily based on the calls for of the queries, Redshift Spectrum can intelligently scale out to reap the benefits of massively parallel processing.

Use instances which may profit from utilizing Redshift Spectrum embrace:

  • A big quantity of less-frequently accessed knowledge
  • Heavy scan-intensive and aggregation-intensive queries
  • Selective queries that may use partition pruning and predicate pushdown, so the output is pretty small

Redshift Spectrum offers you the liberty to retailer your knowledge the place you need, within the format you need, and have it accessible for processing while you want it.

With Redshift Spectrum, you reap the benefits of a quick, cost-effective engine that minimizes knowledge processed with dynamic partition pruning. You’ll be able to additional enhance question efficiency by decreasing the quantity of information scanned. You would do that by partitioning and compressing knowledge and through the use of a columnar format for storage.

For extra particulars on the way to optimize Redshift Spectrum question efficiency and price, see Finest Practices for Amazon Redshift Spectrum.

Optimize MicroStrategy

On this part, we talk about methods to optimize MicroStrategy.

SQL optimizations

With MicroStrategy 2021, MicroStrategy has delivered assist for 70 new superior customizable features to reinforce usability and functionality, particularly when in comparison with beforehand present Apply features. Utility architects can customise the features and make them prepared and accessible for normal customers like enterprise analysts to make use of! For extra info on the way to use these new customizable features, go to the MicroStrategy group website.

SQL World Optimization

This setting can considerably scale back the variety of SQL passes generated by MicroStrategy. In MicroStrategy, SQL World Optimization reduces the full variety of SQL passes with the next optimizations:

  • Eliminates unused SQL passes – For instance, a temp desk is created however not referenced in a later go
  • Reuses redundant SQL passes – For instance, the very same temp desk is created a number of occasions when a single temp desk is created
  • Combines SQL passes the place the SELECT record is completely different – For instance, two temp tables which have the identical FROM clause, joins, WHERE clause, and GROUP BY SELECT lists are mixed into single SELECT assertion
  • Combines SQL passes the place the WHERE clause is completely different – For instance, two temp tables which have identical the SELECT record, FROM clause, joins, and GROUP BY predicates from the WHERE clause are moved into CASE statements within the SELECT record

The default setting for Amazon Redshift is to allow SQL World Optimization at its highest stage. In case your database occasion is configured as an earlier model of Amazon Redshift, you might have to allow this setting manually. For extra info, see the MicroStrategy System Administration Information.

Set Operator Optimization

This setting is used to mix a number of subqueries right into a single subquery utilizing set operators (reminiscent of UNION, INTERSECT, and EXCEPT). The default setting for Amazon Redshift is to allow Set Operator Optimization.

SQL question era

The MicroStrategy question engine is ready to mix a number of passes of SQL that entry the identical desk (sometimes the principle reality desk). This may enhance efficiency by eliminating a number of desk scans of huge tables. For instance, this function considerably reduces the variety of SQL passes required to course of datasets with customized teams.

Technically, the WHERE clauses of various passes are resolved in CASE statements of a single SELECT clause, which doesn’t include {qualifications} within the WHERE clause. Typically, this elimination of WHERE clauses causes a full desk scan on a big desk.

In some instances (on a report-by-report foundation), this method might be slower than many extremely certified SELECT statements. As a result of any efficiency distinction between approaches is generally impacted by the reporting requirement and implementation within the MicroStrategy utility, it’s vital to check each choices for every dataset to establish the optimum case.

The default conduct is to merge all passes with completely different WHERE clauses (stage 4). We advocate testing any choice for this setting, however mostly the largest efficiency enhancements (if any) are noticed by switching to the choice Stage 2: Merge Passes with Completely different SELECT.

VLDB Class VLDB Property Setting Worth
Question Optimizations SQL World Optimization Stage 2: Merge Passes with Completely different SELECT

SQL dimension

As we defined earlier, MicroStrategy tries to submit a single question assertion containing the analytics of a number of passes within the derived desk syntax. This may result in sizeable SQL question syntax. It’s potential for such a press release to exceed the capabilities of the motive force or database. Because of this, MicroStrategy governs the dimensions of generated queries and throws an error message if that is exceeded. Beginning with MicroStrategy 10.9, this worth is tuned to present Amazon Redshift capabilities (16 MB). Earlier variations specify a smaller restrict that may be modified utilizing the next VLDB setting on the Amazon Redshift DB occasion in Developer.

VLDB Class VLDB Property Setting Worth
Governing SQL Dimension/MDX Dimension 16777216

Subquery sort

There are various instances through which the SQL engine generates subqueries (question blocks within the WHERE clause):

  • Experiences that use relationship filters
  • Experiences that use NOT IN set qualification, reminiscent of AND NOT
  • Experiences that use attribute qualification with M-M relationships; for instance, displaying income by class and filtering on catalog
  • Experiences that elevate the extent of a filter; for instance, dimensional metric at Area stage, however qualify on retailer
  • Experiences that use non-aggregatable metrics, reminiscent of stock metrics
  • Experiences that use dimensional extensions
  • Experiences that use attribute-to-attribute comparability within the filter

The default setting for subquery sort for Amazon Redshift is The place EXISTS(choose (col1, col2…)):

create desk T00001 (
       year_id NUMERIC(10, 0),
       W000001 DOUBLE PRECISION)
insert into ZZMD00DistKey(1)
choose a12.year_id  year_id,
       sum(a11.tot_sls_dlr)  W000001
from   items2 a11
       be a part of   dates    a12
         on   (a11.cur_trn_dt = a12.cur_trn_dt)
the place ((exists (choose      r11.store_nbr
       from   gadgets r11
       the place r11.class_nbr = 1
        and   r11.store_nbr = a11.store_nbr))
 and a12.year_id>1993)
group by      a12.year_id

Some experiences might carry out higher with the choice of utilizing a short lived desk and falling again to IN for a correlated subquery. Experiences that embrace a filter with an AND NOT set qualification (reminiscent of AND NOT relationship filter) will possible profit from utilizing temp tables to resolve the subquery. Nonetheless, such experiences will in all probability profit extra from utilizing the Set Operator Optimization choice mentioned earlier. The opposite settings aren’t prone to be advantageous with Amazon Redshift.

VLDB Class VLDB Property Setting Worth
Question Optimizations Subquery Sort Use momentary desk, falling again to IN for correlated subquery

Full outer be a part of assist

Full outer be a part of assist is enabled within the Amazon Redshift object by default. Ranges at which you’ll be able to set this are database occasion, report, and template.

For instance, the next question exhibits using full outer be a part of with the states_dates and areas tables:

choose pa0.region_id W000000,
       pa2.month_id W000001,
       sum(pa1.tot_dollar_sales) Column1
from   states_dates pa1
       full outer be a part of       areas     pa0
         on (pa1.region_id = pa0.region_id)
       cross be a part of    LU_MONTH      pa2
group by      pa0.region_id, pa2.month_id

DISTINCT or GROUP BY choice (for no aggregation and no desk key)

If no aggregation is required and the attribute outlined on the desk isn’t a main key, this property tells the SQL engine whether or not to make use of SELECT DISTINCT, GROUP BY, or neither.

Potential values for this setting embrace:

  • Use GROUP BY

The DISTINCT or GROUP BY choice property controls the era of DISTINCT or GROUP BY within the SELECT SQL assertion. The SQL engine doesn’t take into account this property if it could make the choice primarily based by itself information. Particularly, the SQL engine ignores this property within the following conditions:

  • If there may be aggregation, the SQL engine makes use of GROUP BY, not DISTINCT
  • If there is no such thing as a attribute (solely metrics), the SQL engine doesn’t use DISTINCT
  • If there may be COUNT (DISTINCT …) and the database doesn’t assist it, the SQL engine performs a SELECT DISTINCT go after which a COUNT(*) go
  • If for sure chosen column knowledge varieties, the database doesn’t enable DISTINCT or GROUP BY, the SQL engine doesn’t do it
  • If the SELECT stage is similar because the desk key stage and the desk’s true key property is chosen, the SQL engine doesn’t situation a DISTINCT

When not one of the previous circumstances are met, the SQL engine makes use of the DISTINCT or GROUP BY property.

Use the most recent Amazon Redshift drivers

For working MicroStrategy experiences utilizing Amazon Redshift, we encourage upgrading when new variations of the Amazon Redshift drivers can be found. Working an utility on the most recent driver offers higher efficiency, bugs restoration, and higher security measures. To get the most recent driver model primarily based on the OS, see Drivers and Connectors.


On this publish, we mentioned varied Amazon Redshift cluster optimizations, knowledge mannequin optimizations, and SQL optimizations inside MicroStrategy for optimizing your Amazon Redshift and MicroStrategy deployment.

In regards to the Authors

Ranjan Burman is an Analytics Specialist Options Architect at AWS. He makes a speciality of Amazon Redshift and helps prospects construct scalable analytical options. He has greater than 13 years of expertise in several database and knowledge warehousing applied sciences. He’s obsessed with automating and fixing buyer issues with using cloud options.

Nita Shah is a Analytics Specialist Options Architect at AWS primarily based out of New York. She has been constructing knowledge warehouse options for over 20 years and makes a speciality of Amazon Redshift. She is targeted on serving to prospects design and construct enterprise-scale well-architected analytics and determination assist platforms.

Bosco Albuquerque is a Sr Accomplice Options Architect at AWS and has over 20 years of expertise working with database and analytics merchandise from enterprise database distributors and cloud suppliers, and has helped giant know-how firms design knowledge analytics options in addition to led engineering groups in designing and implementing knowledge analytics platforms and knowledge merchandise.

Amit Nayak is chargeable for driving the Gateways roadmap at MicroStrategy, specializing in relational and massive knowledge databases, in addition to authentication. Amit joined MicroStrategy after finishing his grasp’s in Enterprise Analytics at George Washington College and has maintained an oversight of the corporate’s gateways portfolio for the three+ years he has been with the corporate.