Yeah, it's a good question and I can try to break down what we at Cube with the product and kind of present it number of layers. But the thing is that kind of on the Cube side and Cube is a product is available under open source license, which you can self host, run it in Docker or Kubernetes, or you can use a hosted version in Cube Cloud, but there are a lot of moving parts. So everything that I will be referring to down the road, this are more logical parts, right?
It's helps to think about Cube as a combination of four layers. This is data modeling layer, the access control layer, the caching layer, and API layer. And that's and they are even arranged in, in an order that you would probably used configure them and to make sure the data from your data stores is delivered to data consumers. So the data modeling layer, this is, well this is just another term the metrics layer or semantic layer. So I believe what Cube has in data modeling layer, this is very close to what you would probably do if you, you define their metrics and looker or what you would do when you when you configure metrics in dtb semantic layer.
So data modeling layer is just the same thing. The goal here is to provide a way to declaratively, define what kind of quantitative data, what kind of metrics need to be calculated on top of data that you have in data warehouse and the Cube s data modeling language is really similar, conceptually to one Looker right? So, so you define just logical entities. We call them Cube s that contain measures and dimensions. And measures are basically what you wanna calculate. This aggregates already data and dimensions are the Quality of characteristics of data that you break your metrics with.
And yeah, and you can define it many Cube s as you want. You can define joints between them and kinda model your domain with the Cube s. So most of the times, a Cube might be defined as reflect a single table in data warehouse. So just define it as like select star a table in data warehouse, but it doesn't need to.
So if you already have kind a data transformation layer in front of you, most of it might be just select star from table. And if you don't, ask Cube to model the Cube s on top of pretty much any SQL that might, get complex at times but still, if it makes sense for your for your business domain then you should feel free to, to do so.
So that's data modeling. You make sure that you just have your medicine dimensions and group them into entities, which are called Cube s. And then you have your the access control layer and that makes sense to have this just next to data modeling because it allows to provide consistent access controls.
Right? So regardless of whoever would be accessing those metrics that, data analysts from a data notebook or a CEO in their in a BI tool such as, Metabase or Superset or Tableau their query would need to pass through their access control layer. It would make sure That their all level security or role based access be enforce.
Cube provides provides tools to make sure that meta information is passed from the data consumers. So you can support all kinds of multi tenancy scenarios, right? Where you have different groups of users or users whereas different. So, so you can restrict access to certain metrics or to certain rows within the data, within your tables in the data warehouse for some groups of users or for some users in particular.
And, allow some, allow everything, fulfill the rest of them. Then there is this, caching layer, which and I'm not a fan of that title because I prefer this to be called the acceleration layer. And the goal of that is to make sure that every query, which would be executed by Cube it can be fulfilled within.
A set period of time. So, so most of the times it's just no more than two or 300 milliseconds. And Cube allows for concurrencies up to, requests per second. Maybe one thousands per second, No problem. And there is an amazing technology kinda under the, of that probably, probably it would take some time to really dig deeper into this, but just deriving the knowledge from what's currently hyped in the data community. I would just say that how their, the caching layer is kinda built under the hood. It's really similar to how Duckdb works. So inside Cube there is this custom built Data store. It's coloumnar storage and they're, the only kind of prominent difference Duckdb is that in Cube 's case is distributed.
So, it allows to parallelize the calculations when you have lots of data, but the rest is the same. So it's columnar store written in rust and what Cube does having your metrics defined in the data modeling layer? It just preemptively a synchronously caches that the data that would be needed to fulfill the request and just stores in intermediate format in Paarquet files and inside Cube store there is a I would say proven technologies used. So, as I said, data store in paarquet files, we have we use Apache Arrow format to doing memory processing. Just basically transfer the data. And we use Apache Arrow Data Fusion library to do the cray orchestration and creative planning.
So this is, would say this is more like gluing together some of their well known and kinda to proven pieces of technology in the data space, right? And that, that allows basically any query which isn't coming to Cube to be fulfilled within two or three milliseconds. And here we comes to the last piece, well, from where this queries are coming, right?
So, the last piece of the API layer. And that's interesting. So, in the very beginning there were Cube only had a single api, which was the REST api. And that one basically allows to, to access the data through HHTPS requests, right? That's something that you would do when you are building a front end app, or if you're building some kind automation, right?
That would just do HTTPS requests. But for more than a year already, Cube also has a couple of other APIs. Front end developers. Right now, this is Graphql so Cube also has Graphql api, but what? I'm excited about that. For more than the Year.
Ready Cube also has a sequel api, right? And the SQL API that Cube provides is postgres compliant. Just represents Cube as a postgres database and the Cubes and measures and dimensions that you define in your data modeling layer, they would be available as postgres tables and columns within those tables.
So, this is, yeah, this is. It's really great to have this part compatible with the most I would say the most widespread, the most popular kinda wire protocol the most and being present is the most popular database out there. I mean postgres, because that instantly gives ability to connect you to whatever tool you have if that tool supports psotgres.
Right. So I remember when we just launched our sequel api we tested it with Apache Superset. Which is an open source really cool bi platform. And its work right?
So you take metabase, take Power bi, take Tableau, take whatever you have. And if your tool can interface with a postgres, it can talk to Cube and it means that your data can be driven from Cube to the tool and. And as I said, all those kind, access control configuration and also whatever you configured in the caching layer, that would take effect as well.
So, yeah, and here we come to turn the standing. So if you have Cube in data pipeline that basically you have universal connectivity to whatever data stores you have and you have consistency method definitions the performance of the queries.
And you can connect it to literally every tool out there that you have to display or represent your data.