What is Amazon Redshift?

Amazon Redshift (also known as AWS Redshift) is a fully managed, petabyte-scale, cloud-based data warehouse offered by Amazon Web Services. It’s designed to store and analyze large-scale data storage and is widely used for data warehousing, business reporting, and running complex queries. Redshift is also commonly used for large-scale data migrations from traditional on-premise warehouses to the AWS cloud.

AWS Redshift uses a column-oriented database to connect to SQL-based clients and business intelligence tools, enabling real-time access to your data. While Redshift’s SQL interface is compatible with PostgreSQL, it’s a purpose-built massively parallel-processing (MPP) engine optimized for data analytics. Redshift delivers fast performance and efficient querying that helps your team make sound business analyses and decisions.

Key takeaways

Amazon Redshift is a fully managed, cloud-based data warehouse solution designed for fast and scalable analytics on large datasets.
It supports complex queries using Massively Parallel Processing (MPP) and columnar storage, making it ideal for big data and business intelligence workloads.
Redshift integrates with other Amazon Web Services (AWS), like Amazon S3, Amazon RDS, AWS Glue, and Amazon SageMaker for a seamless data ecosystem.
With Redshift Serverless, users can run analytics without managing infrastructure, paying only for what they use.
Built-in security features include data encryption, IAM access control, and support for Virtual Private Cloud (VPC) configurations.
Redshift’s cost-effective, scalable model makes it a strong alternative to traditional on-premises data warehouses.

What is an AWS Redshift cluster?

Each Amazon Redshift data warehouse contains a collection of computing resources (nodes) organized in a cluster. Each Redshift cluster runs its own Redshift engine and contains at least one database. These clusters can be launched in a Virtual Private Cloud (VPC) and can connect securely with services like Amazon S3, Amazon DynamoDB, and Amazon EMR.

Is Amazon Redshift a relational database?

Yes, it lets you run traditional relational databases in the cloud because it consists of clusters of databases with dense storage nodes. Redshift is Amazon’s analytics database, designed to crunch large amounts of data as a data warehouse. Those interested in Redshift should know that it consists of clusters of databases with dense storage nodes and allows you to run traditional relational databases in the cloud.

Is AWS Redshift fully managed?

Redshift is a fully managed cloud data warehouse. It scales from gigabytes to petabytes, supports data transfer to and from Amazon S3, and allows you to use your data to get new business insights.

Is AWS Redshift good for OLAP?

AWS Redshift was designed for online analytic processing (OLAP) and BI tools. This means any processing that requires complex queries and large datasets will be an ideal use case for Amazon Redshift.

Amazon Redshift vs traditional data warehouses

Amazon Redshift is a direct alternative to on-premise traditional database warehouses. Let’s look at how Redshift stacks up to traditional warehousing in the following areas:

Performance
Cost
Scalability
Security

AWS Redshift performance

Amazon Redshift is best known for its speed. Redshift delivers fast query speeds on large data sets, dealing with data sizes up to a petabyte and more. The speed Redshift processes data is impossible to attain in traditional data warehousing, making it the top choice for applications that run massive amounts of queries on-demand.

This level of performance is made possible by two architectural elements: columnar data storage and massively parallel processing design (MPP). Let’s look at each one and see how they enable fast processing in Redshift.

Redshift’s Massive Parallel Processing (MPP) explained

Redshift’s MPP design automatically distributes workload evenly across multiple nodes in each cluster, enabling speedy processing of even the most complex queries operating on massive amounts of data. Multiple nodes share the processing of all SQL operations in parallel, leading up to final result aggregation. Users can optimize the distribution of data by locating the data where it needs to be before the query is executed. This is done by choosing the appropriate distribution style, minimizing the impact of the redistribution step.

Redshift columnar data storage explained

By using columnar storage for database tables, Amazon Redshift reduces the disk I/O requirements, contributing to the optimization of analytic query performance. When database table information is stored in a columnar fashion, the number of disk I/O requests and the amount of data needed to be loaded from disk are reduced. When less data is loaded into memory, Redshift can perform more in-memory processing for executed queries. The amount of time needed to perform a query is reduced using this method compared to when data is stored by row.

AWS Redshift cost

Cost is often a critical factor when selecting a data warehousing solution. Traditional on-premise data warehouses require hefty upfront investments in hardware, infrastructure, and ongoing maintenance, which could cost millions of dollars.

But, Amazon Redshift provides high-level performance at an affordable price. As a fully managed solution, Redshift doesn’t have any recurrent hardware or maintenance costs, and you can get started with just a few clicks.

With on-demand pricing, you only pay for the cloud computing and storage you use. Database admins can set up data warehouses quickly without enduring the lengthy process of procurement and strategic buy-in from leadership that multi-million-dollar on-premise hardware requires.

AWS Redshift scalability

Scaling traditional on-premise data warehouses is slow, expensive, and often disruptive. If your data needs change, you’re forced to purchase and integrate new hardware, which often requires lengthy procurement cycles.

Redshift allows for more flexibility and elastic scale. Redshift can scale up or down instantly as your requirements change to match your capacity and performance needs with a few clicks in the management console.

Cost-wise, on-demand pricing ensures you only pay for what you use. Not being tied down to expensive hardware and lengthy maintenance contracts means organizations have the liberty to change their minds without having to eat up sunk costs. Redshift now supports newer node types like RA3 and dense compute (DC2), which offer better performance and flexible storage scaling, meaning you have access to processing power on demand.

Security in Redshift

Security is a top concern for any cloud storage or data warehousing solution. While some organizations may hesitate to move away from the perceived control of on-premise environments, Amazon Web Services makes security a foundational priority.

AWS follows the shared responsibility model:

Security of the cloud: AWS secures the underlying infrastructure that supports services like Amazon Redshift, Amazon EC2, and Amazon S3.
Security in the cloud: You control your data security, access permissions, and compliance configurations based on the AWS service you use.

Redshift integrates with AWS Identity and Access Management (IAM) for fine-grained control over data access. Data in transit is encrypted using SSL, and data at rest is encrypted automatically when clusters are created. For organizations using a Virtual Private Cloud (VPC), Redshift ensures secure, isolated network environments. You can even integrate Redshift with services like Amazon S3 and AWS Glue for secure, end-to-end data transfer and ETL workflows.

How to get started using AWS Redshift?

To set up Redshift:

Create an AWS account.
Open a firewall port. By default, Redshift uses port number 5439, but that connection won’t work if the port isn’t open in your firewall. The port number can’t be changed once the cluster is created.
Configure IAM roles for secure data transfer. Check out these instructions on how to create an IAM role.
Launch an AWS Redshift cluster by following the steps below.

Steps to launch an AWS Redshift cluster

After completing the steps above, here’s how you can launch a Redshift cluster:

Open the Amazon Redshift console.
Select the region in which you want to create the cluster.
Choose Create cluster and enter the following values. These are default values for those wanting to explore Redshift while incurring minimal charges. If you already have specific values in mind for your use case, replace these values with those.
- Node type: dc2.large.
- Number of compute nodes: 2.
- Cluster identifier: examplecluster.
- Master user name: awsuser.
- Master user password and Confirm password: Enter a password for the master user account.
- Database port: 5439.
- Available IAM roles: Choose myRedshiftRole.
Click Launch cluster. When done, click Close to return to the list of clusters. The cluster you just launched should be listed there. Check that the Cluster Status says available, and Database Health says healthy.
Choose the cluster you just launched. Click the Cluster button just above the list then click on Modify cluster. In the dialog box that appears, choose the VPC security groups you want to associate with this cluster then click Modify to save the association.

After following the steps, the Redshift cluster is now launched. To connect to the cluster, you need to configure a security group to authorize access. If the cluster is launched in the EC2-VPC platform, follow these instructions from AWS.

How to run queries with a Redshift cluster

Now that you have launched a cluster, you can now connect to it and start running queries. You can run queries using either of these ways:

Connect to your cluster from the AWS Management Console using the AWS Query Editor.
Connect to your cluster through a SQL client tool like SQL Workbench/J.

At this point, you can now use your Redshift cluster. You can create tables in the database, upload data to the tables, and try running queries. These activities can be done through the AWS Query Editor or through a SQL client tool of your choice.

Start monitoring your Amazon Redshift with Sumo Logic

From setting up your cluster to executing complex queries, Redshift helps you handle large amount of data and gain insights from it. But you still need to manage and ensure your Redshift environment is still running effectively past setup. Amazon provides built-in monitoring tools, which are helpful for basic performance tracking, but these tools can be limited and inflexible.

Solutions like Sumo Logic integrate with Redshift to provide deep, real-time insights. With Sumo Logic, you gain more granular visibility into performance metrics, query behavior, and system health, helping you get the most out of your Amazon Redshift investment.

Interested in how to optimize your Redshift cluster monitoring and performance? Learn how Sumo Logic helps you manage and monitor your Redshift environment.