Observability @Elastic

Making websites run smoothly with Elastic
observability

Industry

DevOps, SREs

Client

Elastic

Position

Senior product designer

Have you ever been on a website that takes forever to load or suddenly stops working? Frustrating, right? Websites and apps face these issues all the time, and businesses need to catch them before end users experience problems.

What I did on the project

As part of the Observability team, I led the design of features to help front-end teams identify and fix these issues early, ensuring websites and apps run smoothly.

The image featured in the middle of the about us page

The problem

Our research and conversations with customers revealed several pain points they faced in monitoring website performance:

01

Missing information

With the current offering from Elastic teams lacked insights into how users interacted with their websites. Things like clicks, scrolling, and button presses weren’t tracked in one place, making it hard to get a complete picture of user behavior.

02

Slow websites

Identifying and resolving front-end performance issues was difficult. Websites took too long to load due to factors like heavy images, inefficient code, or excessive processes running at once.

03

It works here, but not there

A site might work perfectly on one browser or device, but glitch on another. Teams had no easy way to check performance across different browsers, devices, and locations.

04

Too much manual work with YAML management

Monitoring issues required writing extensive YAML scripts, which was time-consuming and difficult to manage.

The research

To address these challenges, we conducted in-depth user research. We interviewed DevOps engineers, front-end teams, and SREs to understand their workflow and pain points. The common theme was the need for a unified, intuitive tool to streamline monitoring and improve site performance.

The image featured in the middle of the about us page

The solution: a smarter monitoring tool


We shipped several key features to help teams identify and resolve issues before they negatively impacted end users:

A single dashboard

Problem solved: Centralized tracking for errors, speed, and user actions in real-time.
Teams can now monitor their websites and apps from a single, unified dashboard, giving them visibility into performance across all metrics.

Visual reports

Problem solved: Interactive charts and graphs like the new Exploratory View make it easy to spot issues early.
By visualizing data in real-time, teams can identify performance bottlenecks, site crashes, or slow load times before they affect users.

The image featured in the middle of the about us page
The image featured in the middle of the about us page

Easy test recorder and monitor creation

Problem solved: No need for complex coding or writing long scripts.
Teams can now create tests using an intuitive point-and-click tool that simulates real user behavior. These tests can be logged and tracked directly on the platform, reducing the time and effort spent on manual monitoring.


The image featured in the middle of the about us page
The image featured in the middle of the about us page

Error triage page

Problem solved: Easier debugging and quicker resolutions.
This feature allows teams to compare past and present test results side by side, making it easy to identify when and where things went wrong.

The image featured at the top of the about us page #2
The image featured at the top of the about us page #2
The image featured at the top of the about us page #2
The image featured at the top of the about us page #1
The image featured at the top of the about us page #1
The image featured at the top of the about us page #1

User testing approach

User testing helped us improve the experience by showing us where things were confusing. By listening to their feedback, we made the design simpler, faster, and more fun to use, making sure it worked exactly how users needed it to.

The image featured in the middle of the about us page

Outcome: faster, more reliable websites


With the launch of this tool, we made it easier for our customers to find and fix problems faster. That means fewer crashes, less lag, and a better experience for everyone using the web.