In recent years, non-functional requirements have become increasingly important when building web applications. Whereas satisfying the functional requirements of an application often drives the backend design, developers are increasingly forced to consider non-functional requirements early on in the design process, especially performance. But what if performance was not fully accounted for early on in the design? Not necessarily because of negligence; perhaps the backend was initially expected to provide a service only to a limited number of users.
For example, a web server that may have started as an inside tool for a small team at work, may now be adopted by the entire company, internationally so! In this article, we tackle this question by describing our approach to monitor backend performance and detect bottlenecks, based on our experiences with SoundTalks; a company that specializes in monitoring the health of pigs by deploying IoT devices in farms that collect and process sensor data.
When it comes to testing software, unit testing is absolutely necessary to ensure that individual software components satisfy their functional requirements, i.e. that they produce the expected output for a given input. In practice, software components rarely work in isolation, but instead must be integrated to produce complex results, which is covered by integration testing. Both unit testing and integration testing focus on testing functionality. But how do we test for performance, which is a non-functional requirement? That is what stress testing is all about!
Stress-testing a backend server takes place under extremely heavy load conditions, e.g. by swarming it with requests in order to evaluate its robustness and error handling capabilities. This process builds confidence in the stability and reliability of web applications, as it reveals the backend’s current limits in terms of scalability. Swarm the backend with requests until it crashes (‘500 Internal Server Error’) is a possible stress testing strategy that is not only entertaining, but also incredibly valuable as it reveals the current limits of the backend.
Alternatively, and perhaps a more practical approach, is to simulate peak user loads that are expected or are already present in the real-world (i.e. production environment). From our experience, these two strategies work best combined: current peak user loads are a good starting load to stress-test the backend, which can then be increased incrementally until the backend starts failing, to discover its limits i.e. until server response times start violating Service Level Agreements (“SLAs”) or until the server crashes.
Naturally, a framework is required where developers can define the stress tests for the backend. At Panenco, from our experiences building a vast range of web applications in different domains, we consider stress testing an integral part of the software development process. As such, we opted for locust.io, which is an open-source load testing tool with a ‘test as code’ feature, thus enabling developers to write their stress-tests in pure Python. It is important to note that while the stress tests are written in Python, locust.io imposes no restrictions on the backend tech stack: it can be written in any programming. For new endpoints in the backend, it has become standard practice to complement our unit and integration tests with the corresponding stress tests for these new endpoints. But how do we stress-test existing endpoints?
We tackled the challenge of stress-testing existing endpoints by inspecting typical request flows that the backend already receives in the production environment. More concretely, SoundTalks provides its users with several dashboards from which they can manage devices in the fields along with monitoring their pig’s health status. By inspecting the network traffic through Chrome’s browser tool, we obtained an overview of the relevant endpoints that are called as users navigate through each dashboard, which then drove the stress test design.
In locust.io, we grouped together requests to the backend for each dashboard view, which we can specify to be executed either at random or in sequence, thus simulating user load. Each stress-testing session has a configurable duration and number of users, and results in a report with a clear overview of response times for each endpoint, as shown in Figure 1 and Figure 2 for example applications. Inspecting this report enables us to gain important insights into the performance of the backend, and to detect the endpoints with the highest response times. These insights can already point developers in the right direction to improve backend performance, great! Wouldn’t it be invaluable though if we could zoom deeper into these response times, breaking them down to the main tasks that the backend had to execute for each endpoint? Yes please, and that is why we also use sentry.io in our performance monitoring setup.
Sentry.io is an open-source full-stack error tracking system that can integrate with the backend through a dedicated SDK, available in most popular programming languages. In our case, the backend server is written in Python, using the Django Rest Framework. Generally, configuring sentry for the backend involves linking to a sentry project URL, specifying the fraction of endpoint calls to be captured, along with a number of integrators that wrap parts of code to be monitored.
Fortunately, the Python SDK provides a Django integrator, making it easy to capture all database queries that are executed by the Django object relational manager, as shown by the example in Figure 3. However, some of the endpoints involve querying MongoDB, where the sensor data is stored. These queries were not automatically captured by sentry, since for Python it does not provide a MongoDB integration, resulting in a “Missing instrumentation” grayed-out bar, which can also be seen in the example of Figure 3. Sentry does however provide the necessary tools to manually instrument code, so we developed our own integration for MongoDB to capture these database calls.
Finally, we completed the performance monitoring setup by integrating it with the CICD setup, which is running on GitLab. On one hand, we have created a separate repository containing the stress-testing code written in locust.io. A scheduled pipeline executes a stress testing session on a daily basis, targeting the staging backend server, and generates the report upon completion. On the other hand, Sentry captures a percentage of these endpoint calls, which our developers can view to zoom into their response times. Furthermore, we have also granted Sentry access to automatically create issues on GitLab for our backend and/or to link existing issues to Sentry reports, thus minimizing the manual overhead for the issue tracker. As a result, our developers can easily identify and resolve performance bottlenecks (e.g. querying the database while iterating over a collection rather than prefetching), resulting in up to orders of magnitude less database queries and thus shorter response times!
In summary, web applications are typically made to provide a service to a broader user audience; this is the drive for both developers and product managers in the first place. As a web application becomes increasingly successful, so does its popularity among users and thus its request load and/or connected devices. Thus, it is crucial that the web application has already been stress-tested in advance throughout its development, to ensure its reliability and availability under pressure. In this article, we described how we tackled the challenges of setting up performance monitoring for an existing real-world backend server, in the context of SoundTalks, involving thousands of users and connected IoT devices across the world. After all, humans may be vulnerable to the stress of fame and success, but that is no excuse for web applications😉.