NOI @IBM, Cloud and AI

Making IT systems run smoothly and fixing problems faster

Industry

DevOps, ITOps

Client

IBM

Position

UX designer

When companies rely on complex IT systems, things can go wrong, just like a car breaking down or a phone suddenly freezing. Large enterprises have IT operations (ITOps) teams responsible for keeping everything running, but they deal with massive amounts of data and alerts. Finding the root cause of an issue quickly is a major challenge.

What I did on the project

I led the UX redesign of IBM NOI (now Watson AIOps), an AI-powered platform that helps IT teams predict, prevent, and resolve incidents before they impact business operations. Through research, workshops, and collaboration, I enhanced the user experience while contributing to the Carbon 10 design system for consistency and scalability.

The image featured in the middle of the about us page
The image featured in the middle of the about us page
The image featured in the middle of the about us page

Discovery

We kicked off the project by defining the key challenges IT operations teams faced. This was done through stakeholder workshops, user research, and data analysis.

User research

Key pain points identified

01

Too many alerts, hard to diagnose issues: IT teams struggled with massive amounts of system alerts, making it difficult to pinpoint critical issues.

02

Slow troubleshooting process: identifying the root cause took too long, leading to extended downtime.

03

Manual workflows: teams relied on manual scripting and outdated tools, increasing resolution time.

04

Lack of automation: incident response processes were inconsistent, often requiring repetitive manual interventions.

User research

User Personas

01

👩‍💻 Annette, IT operator: monitors system health, responds to alerts, and needs quick issue detection.

02

🧑‍🔧 Carlos, Operations engineer: troubleshoots incidents and looks for efficient debugging tools.

03

👨‍💼 RJ, Site eeliability engineer: focuses on automating workflows to ensure proactive issue resolution.

The image featured in the middle of the about us page

User research

ITOps goals of NOI

01

Diagnose, troubleshot and resolve issues as fast as they can.

02

See analytics policies that are AI generated and create triggers to groups and priorities events.

03

Define a runbook for a resolution in a easy way.

The image featured in the middle of the about us page

The solution: a smarter IT operations platform


Automation & AI-powered insights

Problem solved: reduces manual intervention by enabling AI-driven detection and prioritization of incidents.
🔹 Automates issue detection, reducing false alarms.
🔹 Uses AI to correlate alerts and highlight critical incidents faster.
🔹 Helps teams focus on real problems rather than sifting through thousands of notifications.

The image featured in the middle of the about us page
The image featured in the middle of the about us page

Faster troubleshooting with historical data

Problem solved: helps teams quickly find the root cause of an issue.
🔹 Provides historical system performance data to identify patterns.
🔹 Displays AI-generated insights for faster resolution.
🔹 Reduces downtime by improving incident detection speed.

The image featured in the middle of the about us page
The image featured in the middle of the about us page

Runbook & rules creation for incident response

Problem solved: standardizes and automates responses to common IT issues.
🔹 Runbooks: teams can define step-by-step resolution workflows for common issues.
🔹 Automation rules: allows teams to trigger automated responses when specific alerts occur.
🔹 Collaboration features: enables teams to edit, review, and deploy runbooks seamlessly.

The image featured in the middle of the about us page
The image featured in the middle of the about us page
The image featured in the middle of the about us page

Processes

User testing & iterations

User testing provided critical insights into how people interact with the features, highlighting pain points and areas for improvement. This feedback enabled us to refine the design and functionality, ensuring a more intuitive and effective user experience.

We conducted multiple rounds of user testing with IT operators, engineers, and site reliability professionals.

Findings from testing:

✔️ Users needed a clearer interface to navigate complex data quickly.

✔️ The troubleshooting flow was initially too complex—we simplified it based on feedback.

✔️ Automation features needed better customization—we added configurable rules.

Final adjustments:

✔️ Streamlined the incident response workflow for faster resolutions.

✔️ Improved dashboard UI to enhance data visibility.

✔️ Moving from Angular to React to migrate to the new Carbon 10.

Carbon adoption:

The adoption of the Carbon design system guild within my portfolio contributed to consistency and efficiency in design processes, ultimately enhancing the overall user experience.

The image featured in the middle of the about us page

Outcome

We successfully onboarded customers to the new UI, resulting in an increase in usage. The implementation of new features led to a 25% reduction in the mean time to resolution (MTTR), highlighting the effectiveness of our enhancements in detection and resolution processes.