
NOI @IBM, Cloud and AI
Making IT systems run smoothly and fixing problems faster
Industry
DevOps, ITOps
Client
IBM
Position
UX designer
When companies rely on complex IT systems, things can go wrong, just like a car breaking down or a phone suddenly freezing. Large enterprises have IT operations (ITOps) teams responsible for keeping everything running, but they deal with massive amounts of data and alerts. Finding the root cause of an issue quickly is a major challenge.
What I did on the project
I led the UX redesign of IBM NOI (now Watson AIOps), an AI-powered platform that helps IT teams predict, prevent, and resolve incidents before they impact business operations. Through research, workshops, and collaboration, I enhanced the user experience while contributing to the Carbon 10 design system for consistency and scalability.
Discovery
We kicked off the project by defining the key challenges IT operations teams faced. This was done through stakeholder workshops, user research, and data analysis.
User research
Key pain points identified
01
Too many alerts, hard to diagnose issues: IT teams struggled with massive amounts of system alerts, making it difficult to pinpoint critical issues.
02
Slow troubleshooting process: identifying the root cause took too long, leading to extended downtime.
03
Manual workflows: teams relied on manual scripting and outdated tools, increasing resolution time.
04
Lack of automation: incident response processes were inconsistent, often requiring repetitive manual interventions.
User research
User Personas
01
👩💻 Annette, IT operator: monitors system health, responds to alerts, and needs quick issue detection.
02
🧑🔧 Carlos, Operations engineer: troubleshoots incidents and looks for efficient debugging tools.
03
👨💼 RJ, Site eeliability engineer: focuses on automating workflows to ensure proactive issue resolution.

User research
ITOps goals of NOI
01
Diagnose, troubleshot and resolve issues as fast as they can.
02
See analytics policies that are AI generated and create triggers to groups and priorities events.
03
Define a runbook for a resolution in a easy way.

The solution: a smarter IT operations platform
Automation & AI-powered insights
Problem solved: reduces manual intervention by enabling AI-driven detection and prioritization of incidents.
🔹 Automates issue detection, reducing false alarms.
🔹 Uses AI to correlate alerts and highlight critical incidents faster.
🔹 Helps teams focus on real problems rather than sifting through thousands of notifications.


Faster troubleshooting with historical data
Problem solved: helps teams quickly find the root cause of an issue.
🔹 Provides historical system performance data to identify patterns.
🔹 Displays AI-generated insights for faster resolution.
🔹 Reduces downtime by improving incident detection speed.


Runbook & rules creation for incident response
Problem solved: standardizes and automates responses to common IT issues.
🔹 Runbooks: teams can define step-by-step resolution workflows for common issues.
🔹 Automation rules: allows teams to trigger automated responses when specific alerts occur.
🔹 Collaboration features: enables teams to edit, review, and deploy runbooks seamlessly.



Processes
User testing & iterations
User testing provided critical insights into how people interact with the features, highlighting pain points and areas for improvement. This feedback enabled us to refine the design and functionality, ensuring a more intuitive and effective user experience.
We conducted multiple rounds of user testing with IT operators, engineers, and site reliability professionals.
Findings from testing:
✔️ Users needed a clearer interface to navigate complex data quickly.
✔️ The troubleshooting flow was initially too complex—we simplified it based on feedback.
✔️ Automation features needed better customization—we added configurable rules.
Final adjustments:
✔️ Streamlined the incident response workflow for faster resolutions.
✔️ Improved dashboard UI to enhance data visibility.
✔️ Moving from Angular to React to migrate to the new Carbon 10.
Carbon adoption:
The adoption of the Carbon design system guild within my portfolio contributed to consistency and efficiency in design processes, ultimately enhancing the overall user experience.

Outcome
We successfully onboarded customers to the new UI, resulting in an increase in usage. The implementation of new features led to a 25% reduction in the mean time to resolution (MTTR), highlighting the effectiveness of our enhancements in detection and resolution processes.
Next project