Dell Data Lakehouse – Going Warp Speed

By Vrashank Jain – Senior Product Manager, Dell Technologies

It’s been a busy spring and summer here at Dell. In March, on the first day of spring, we launched the Dell Data Lakehouse. This turnkey solution features a powerful query engine powered by Starburst, which provides high-performance, high-concurrency access to distributed data, regardless of the data source. The foundation of Dell Data Lakehouse is built on Dell S3-compatible storage providing a highly performant, high availability storage layer to store and query data in open formats like Iceberg.

In today’s fast-paced world, IT and data leaders face a tough challenge, accelerating analytics and AI while keeping costs in check. The tradeoff between speed and cost can prove expensive on either end. While adopting the data lakehouse offers performance advantages at lower costs, data engineering and IT teams still grapple with deciding which data to optimize and cache and which to leave as is in the data lake.

Consider this feedback from data teams:

  • “I’m the data engineer, and the analysts keep asking me to make copies and re-partition data (by date, by customer) when they have a new question so that response times are tolerable. It’s a waste of my time and storage isn’t free!”
  • “I’m the Data Leader for the data org, and I can’t just keep throwing money at more and bigger clusters to meet my query response time SLAs! Data sizes are growing faster than my budget.”

This is why we’re thrilled to introduce Warp Speed in the Dell Data Lakehouse for data on Dell’s S3-compatible storage!

What is Warp Speed?

Warp Speed is a new feature in the Dell Data Lakehouse that autonomously learns query patterns and identifies frequently accessed data to create optimal indexes and caches while keeping infrequently accessed data where it is.

What does it do?

It delivers on the promise of accelerating query performance while keeping costs in the check. With Warp Speed, the same cluster can run data lake queries 3x to 5x faster without requiring any change in the query by the end user. It can also help reduce cluster sizes by up to 40%.

More simply put, organizations can run more queries on large clusters or run the same volume of queries on smaller clusters.

  • Accelerating data lakes: Autonomously index the data lake and on-demand accelerate exploratory datasets without involving data engineering.
  • Building high-performance dashboards: Faster drill-down on TBs to PBs of data, without any change to the end user experience. The same queries now just run faster.

How does it do that?

Warp Speed employs a combination of acceleration technologies to achieve its remarkable performance:

  • Autonomous Indexing: This feature creates appropriate index types (bitmap, dictionary, tree) tailored to each data block, accelerating operations such as joins, filters, and searches. Indexes are stored on an SSD in the compute nodes for rapid access.
  • Smart Caching: Smart caching is a proprietary SSD columnar block caching that optimizes performance based on frequency of data usage. Caching eliminates unnecessary table scanning and provides more reuse of data between queries thus saving compute costs.

How do I get it?

Starting July 17th, Warp Speed will be available to all Dell Data Lakehouse customers and supported for those who are using Dell S3-compatible storage as their data lake. There is no change to the software license – this is now built-in! The configuration of the compute nodes will be modified to include SSDs that have been fully tested and benchmarked by Dell to support the Warp Speed index and cache.

Accelerated Innovation in the AI Era

Dell Data Lakehouse with Warp Speed sets a new benchmark in data lake analytics, empowering organizations to derive insights from their data more quickly and efficiently than ever before. Warp Speed unlocks the full potential of the Dell Data Lakehouse, paving the way for accelerated and budget-friendly innovation and growth in the AI era.

To get a full, hands-on experience, visit the Dell Demo Center to interactively explore the Dell Data Lakehouse with labs hand-picked for you by Dell Technologies’ experts. You can also contact your Dell account executive to explore the Dell Data Lakehouse for your data needs.

And check out this blog to find out more about the latest release of the Dell Data Lakehouse!

Note on Performance Benchmarking:

These performance benchmarks are based on testing conducted by Dell in July 2024 using TPC-DS 1TB and 10TB datasets stored on Dell ECS S3-Compatible Object Storage against a variety of Dell Data Lakehouse cluster sizes (6 workers, 11 workers and 16 worker nodes) with each worker node configured with 64 vCPUs and 256GB RAM. The TPC-DS benchmark queries cover a wide range of query scenarios including reporting, ad hoc and interactive.

  • Our results show that Warp Speed provides performance improvement generally across such scenarios, and between 3x to 5x for the top 20% of queries.
  • Compute savings are estimated by comparing the total queries executed per 10 mins by a 6- worker node with Warp Speed to that of a 11-worker node without Warp Speed and extrapolating how many Warp Speed-enabled nodes could provide the same level of performance as an 11-worker node cluster without Warp Speed. We repeated this test for both 1TB and 10TB datasets.

Leave a Reply

Your email address will not be published. Required fields are marked *