Dental Hygiene Diagnosis and Treatment Plan

The dental hygiene diagnosis and treatment plan involves many aspects of dentistry. A successful treatment plan requires strong critical thinking and problem-solving skills. Our goal as dental…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




How fast is my query in a headless CMS?

A headless CMS is very similar to a database. It provides an API and user interface to manage your content of course you can also retrieve the content using. In almost every use case you want to apply a filter to only query the content items you actually need.
The question is: How fast are these queries and filters?

In this article we describe the problem using an example project: Our goal is to build a travel website where the user can search for hotels. For each hotel we also store the name of the location (city). Our website has a one landing page per city where we show the hotels in this city and we also provide a search where the user can enter a location. Of course we only want to show the hotels in selected location and therefore we need to apply a filter before we display the hotels to the user. If we have only very few hotels (lets say 50) we can just fetch all hotels from the headless CMS and then reduce our result set to only the hotels in the current location. But if we scale our business this is not the appropriate solution. The expedia hotel database has more than 300.000 hotels and if we assume that each hotel has 100kB of text, we would have to download 30GB of data in each request. Therefore we must apply the filter as early as possible.

Fortunately many headless CMS provide filtering in their API but the question is: How fast is this?

Somehow the CMS needs to store your content somewhere and usually they either store your content directly on a disk or in database, which also persists the content on a disk. Usually this is a SSD, which makes everything much faster than a hard disk. In this example we assume that the headless CMS stores everything on disk, because it makes our example a little bit easier and we can also apply the same techniques that a database engine would do. To get the matching hotels we have to loop over all the hotels in our database file and test if the a hotel meets the given location or not.

In pseudo code it would look like this:

If we consider that we have a state of the art SSD we can read the data with 2000MB/second, so it would take around 15seconds to fetch and scan the hotels. Of course it is not that simple:

Overall it is hard to estimate the duration of a query, but usually it is very slow. If we only have one user and our code and servers are very fast we can perhaps deliver the results in 30 seconds, but it is very likely that our query takes several minutes to complete. Some systems also have an upper limit how long a query could take and the query would just timeout and do not deliver a result even though there is one in database. Of course this is not acceptable and have to find a better solution.

One solution is not to allow such big data sets. For example contentful [1] only allows 25.000 (or 50.000 records) per project. GraphCMS [2] has the same limitation. Of course this makes our queries much faster. Because we also want to store other content, we probably have not more than 10.000 records per content type. Filtering 10.000 hotels is of course 30 times faster than filtering 300.000 hotels, but is also means that you cannot use a headless CMS for our use case. But there is a better solution:

The reason why our queries are so slow is that we have a lot of hotels and that each hotel is very big. If we could store the hotels in the memory it would be much faster, but to do that we have to reduce the size. For our query we are only interested in the location of the city, so we can optimize our headless CMS by keeping a an optimized data structure in memory that only contains the ID and location for each hotel. As a table it would look like this:

In pseudo code it would look like this:

In some cases we can also use the index even though it does not cover all fields of our query. If we also filter by the rating and want to have all five-star hotels in a location we can get the results with the following pseudo code:

In this example we have to fetch a lot of hotels from the disk that we don’t need (4 stars or less), but at least we do not have to fetch the hotels from other locations.

We have a strategy now to boost our performance but there are a few problems:

We have 2 options now:

If we use a database in our headless CMS we could expose the functionality to create indexes for developers. This is complicated because to create the appropriate index we have to analyze our queries first. Database engines provide tools for this and you can analyze complex diagrams to understand what actually happens under the hood when an query is executed:

There is no alternative to these diagrams and therefore the headless CMS has to provide the same functionality. This adds a lot of complexity to the system and does not really solve the problem. The first issue is that sometimes we have to change our content structure to use indexes. For example you also need an index for sorting. In older versions of MySQL (a populate database engine) it was not possible to sort in descending order using an index. For example if you want to show a list of products with highest price first. A solution was to maintain an additional field with an “negative price”, e.g.

The negative price is than lower the higher the price is and you can sort by the negative price in ascending order to show the highest price first.

Creating the correct indexes needs a lot of detailed knowledge about the used database and is not a task for a Junior Developer. As a headless CMS developer we also risk to expose details about the underlaying database to our users and we increase the coupling because we cannot switch to another technology that easily.

We can also create indexes on the fly by analyzing queries. Our headless CMS stores which fields are used in our queries and how often they are executed. If an query is very slow and is used very often we create an index for that. Managing indexes is not easy for a developer and much harder if we want to automate the process. Because each index consumes resources just to keep it in memory and also to update the index when new records are added to the headless CMS, we also have to destroy indexes when they are not needed anymore. If also means that our website might be very slow at the beginning and becomes faster over the time which is not always acceptable, because we want to have a great experience for all users.

Both solutions are problematic because an index consumes a lot of resources and if we allow the system or the user to create indexes on the fly it puts a lot of stress to our database. This only works if we create an isolated database server for each customer, which is expensive. This can only work if you provide a hosted headless CMS solution for of 500$ or more per month. So it might be option for GraphCMS and Contentful but not for Squidex.

There are also working solutions:

If we do not use a SaaS offering and host the headless CMS ourselves we have full control over the database and can create the indexes that we need. It is still a task for a Senior developer but if we built everything from scratch and do not use a headless CMS we also have to optimize our queries.

Usually a query is relatively slow because there is no index. But there are a few exceptions:

As described above it is complicated to provide fast queries for all use cases, if not impossible. But there are a few things you can do:

Add a comment

Related posts:

Our Top 10 Reads of 2019

As we close another year, and another decade, we are honored to have been a part of many impactful conversations centered on education — reporting on the trends, challenges, and changes that shape…

Create a communication plan in three steps

Writing a communication plan can seem a daunting task, however this article describes a simple method that breaks it down into three simple steps: use themes to describe the communication approach…

What defines a REST API? The 6 Constraints that Define RESTful

APIs come in many shapes and sizes. While plenty of developers have APIs to thank for making their jobs more manageable in a hundred different ways, not many actually take the time to learn more…