Most teams don't monitor the things that actually break.
When it comes to monitoring, the usual checks cover:
CPU
Memory
Disk
Database
Application availability
If a server goes down, everyone finds out quickly.
Alerts trigger. Dashboards turn red. An investigation begins.
But over the last few years at Olisen Studio, we’ve noticed an interesting pattern: the most troublesome incidents rarely involve server crashes.
It is usually something else that breaks. And these are precisely the kinds of problems that often go unnoticed the longest.
A modern SaaS product rarely exists in isolation.
Even a relatively simple application might depend on dozens of external services:
Stripe
OpenAI
Telegram
GitHub
Notion
Slack
Internal APIs
A failure in any of these dependencies can impact a critical product function.
For example:
an OpenAI API key expires;
a Stripe webhook stops delivering events;
an OAuth token loses necessary permissions;
the GitHub API starts returning errors;
a background task fails to execute;
an integration stops receiving events.
Yet, from an infrastructure perspective, everything looks fine.
The site loads. The database is running. The server is responding. Monitoring shows a green status.
But some functionality is already unavailable.
Traditional monitoring answers the question:
Is the system running?
But for the user, a different question matters more:
Is the feature they came for actually working?
Imagine a service that automatically publishes content via a third-party API.
If that API stops accepting requests:
the application keeps running;
users can still log in;
the server shows no critical errors.
However, the product's core value disappears. Sometimes, such failures are detected after a few hours. Sometimes after a few days. And sometimes only after a customer contacts support.
It is precisely this gap between the problem’s occurrence and its detection that often becomes the costliest part of the incident.
Just a few years ago, most failures occurred within a company's own infrastructure.
Today, the situation has changed.
Even a small SaaS product may depend on dozens of external components, each of which becomes a potential point of failure.
Moreover, many of these services lie outside the development team's control.
You cannot fix a bug in OpenAI. You cannot influence Stripe's infrastructure. You cannot force the GitHub API to run faster.
But you can find out about the problem much sooner.
And the speed of detection directly impacts the scale of the consequences.
In practice, it makes sense to monitor not just the infrastructure, but also critical business processes.
It is important to monitor not only service availability but also the correctness of its operation.
An API might return an HTTP 200 status code while still delivering erroneous data or incomplete results.
Many processes rely on incoming events.
If events stop arriving, the product may appear fully functional for a long time.
Expired tokens, lost permissions, and incorrect credentials remain among the most common causes of silent failures.
Queues, schedulers, and cron jobs often fail without triggering infrastructure monitoring alerts.
It is useful to verify actual user scenarios:
are payments going through?
is content being created?
are notifications being delivered?
is data synchronizing?
are key integrations working?
Infrastructure monitoring remains essential. But today, that is no longer enough.
Modern products rely increasingly on external services, APIs, and integrations. Consequently, the question:
"Is the server running?"
is gradually giving way to another:
"Is the product working as a whole?"
A significant number of incidents today arise precisely at the level of dependencies, integrations, and user workflows.
And that is exactly where many teams remain in the dark.
At Olisen Studio, we encountered this issue while working on SaaS projects and internal tools. That is why we are currently exploring approaches to monitoring external dependencies, APIs, webhook integrations, and background processes.
If this topic resonates with you, or if you have dealt with similar incidents, we would love to discuss your experience.
Checklane: https://checklane.olisen.studio/
We'll figure it out together—
and show you how to solve
the problem quickly and effectively