Jim Ferguson is Director of Product Management at RVO Health, a health and wellness software platform. He began his career as an IT analyst at Quest Communications (now CenturyLink) and later managed user acceptance testing at Dex Media. Jim then transitioned to change management at R.H. Donnelley, now Dex One, and eventually became a product manager for its advertiser platform. Before his current role at RVO Health, Jim was the product owner of client platforms at DexYP.
In our conversation, Jim shares how he led an initiative to overhaul the once-manual process of approving or rejecting patient reviews on RVO Health’s platform and automate it with AI. He reflects on the challenges of making such complex technologies and processes intuitive on the user-facing side, and discusses how he gained buy-in and confidence from stakeholders along the way.
The initial problem was that we had a growing backlog of user-submitted reviews that hadn’t been vetted to ensure they met our editorial guidelines. Many of these reviews were about specific doctors, so we wanted to make sure that they included a rating and comment based on the doctor themself instead of other factors like the office setting, parking availability, etc. We have different places where we collect reviews on those aspects of the experience, but if someone gave a doctor a five-star review, we wanted to make sure that it was attributed correctly. How was the actual care? Did they feel like they got the help they needed?
We found it important to capture the user’s review in the moment, not long after they had their appointment. From a user’s perspective, they wanted to provide feedback right after the experience, but our old process of manual review and approval took time. The reviewer would send a message to the user letting them know if the review had been approved or denied, but it was not a quick process, especially when we had a review backlog.
In the case of a rejected review, we’d communicate back to the user and explain why we couldn’t accept their comment. In those cases, we found that most people didn’t come back to make the minor edits that would lead the review to get approved.
We realized that it would be better to engage the user while they were writing the review to prevent those common errors that trigger a rejection. They could make modifications right then and there as they were writing. Staffing humans to read reviews as they’re being written is not very realistic, so we looked for a technology solution. It had to understand the concepts of a written language and our editorial policy, while also doing it in real-time.
Roughly 60 percent of reviews would get approved before we incorporated the AI tool. When we’re talking about millions of reviews, that’s a huge number of rejected reviews that a lot of times, wouldn’t be re-engaged with. With the addition of the AI tools, that approval number has gone up to about 80 percent.
From a user satisfaction perspective, it’s been hugely beneficial as well. One common frustration was that reviews would be declined for mentioning a drug’s brand name. Now, we can call that out as the user is writing the review. This has been a huge help for both us and for the reviewer. We’ve seen a big increase in not only the number of reviews being submitted, but also the overall accuracy of the tool compared to our human review policy.
This was one of our engineering team’s first AI projects. They had to jump in and learn which LLM was best to use, how to work with it, how to train it, and more. For example, how many reviews could we run through before we had to retrain the AI on our editorial guidelines? We also hadn’t worked out all the details about how many characters someone could type before the AI forgets what the guidelines are.
Also, there’s a lot of sensitivity around projects that involve automating a person’s job. There were fears that we’d reduce headcount as a result of this initiative. However, the people in our group who were checking review submissions were responsible for other activities as well, so we made it clear that we were not trying to eliminate their roles. Frankly, this initiative was about freeing up their time.
There are some instances when AI can’t determine if the review meets the standards to be accepted. In those cases, we need the AI to say, “I can’t determine whether to approve or not” and then real operations people go in and apply their knowledge to make a final determination. The tool is able to approve or reject all the straightforward ones and then it sends the complex ones to a human.
We collected a small set of historical data using previously reviewed submissions. Training the AI initially involved seeing if it reached the same conclusion as an operations person. We found a few gray areas, which likely had to do with human bias. From these findings, we identified areas to make consistency improvements.
We not only fed the AI old reviews with the results and compared its accuracy, but we also brought in stakeholders who are not involved in review approvals and let them decide if the decision seemed accurate. This enabled stakeholders to see firsthand how AI matches against their human judgment. The AI tool was also trained to provide feedback about why it rejects a review and cite a specific policy. That was very important for the overall project success.
Balancing complexity and simplicity was definitely something we had to consider. For many of our stakeholders, this was the first AI project they were involved with. We explained why the initiative was so important and had them interact with the experience in person. We took a set of reviews that had been rejected and asked the stakeholder group to enter those reviews into the AI test environment to see if the AI’s feedback around the rejection was helpful enough. We provided guidelines for them to use, which helped us arrive at a good level of intuitiveness with the software.
The complexity came in trying to find a simple way to explain feedback to a user, and at the same time, to our stakeholders. What the system is doing, and how it’s doing it? We tried to rule out situations where editorial policies conflicted, and that took a few iterations. Of course, there were other implications like cost. In some cases, we had to trade off between accuracy and the cost of the model.
When we first rolled out the AI solution, we continued to err on the side of caution and deployed it in parallel with a human for approvals. So, even though the tool provided real-time feedback to users and determined if it was approved or declined, we still had a human verify it for a few weeks. This last step was a real, live comparison of new reviews to ensure the AI was as accurate as we thought it was.
We’ve found the AI to be about 96 percent accurate in making that determination on whether to approve or reject a review, which is great. We started out with an objective of a 90 percent success rate, so it’s been amazing to exceed that. This last step also gave confidence to our stakeholders that this solution was worth the effort we invested in it.
It was surprising, even to our stakeholders, to see how reviews were rejected based on something as simple as including a drug’s brand name, or how many people tried to write a review about something that was not necessarily about the doctor — like how busy the doctor’s office was that day.
Being able to educate our users in real-time also led to some later changes we’ve implemented, such as detecting when someone starts writing about something we can’t include in the review. The AI now says something like, “It seems like you’re writing a review about the office. Would you like to submit this specifically about the office, or would you like to write a review about the doctor?”
This helps guide the reviewer based on what their intentions are. We want to support them and collect their information, but also make sure it’s in the right place.
As we were talking through the project, we related everything back to old school programming — we explained that we were going to set up a set of rules and have the program enforce them. We also explained that the AI is looking for all the things a human would in a review, but it’s doing it with natural written language to determine the context of what’s being written. Is that word being used in a positive or negative context? AI is very good at the nuances of the human language, and explaining it this way helped less technical stakeholders understand the nitty gritty of what we were doing.
People usually write a review because they either had a great experience or a really bad one. They don’t typically write a review if the experience was mediocre. With that said, we don’t exclude any review just because it’s negative, or keep it because it’s positive. We want to make sure that we’re being subjective and accurate. Having stakeholders interact with the system and the process helped them understand it and build trust in it.
There are always nitpicky things, like getting stakeholders involved even earlier. Of course, we work with a broad set of stakeholders, but it would’ve been helpful to get specific executives’ understanding and buy-in earlier in the process.
Also, our initial intention was to be able to process reviews faster and alleviate the backlog of reviews. Adding real-time feedback to the submission process was not a feature we thought of immediately. If we had started with that, I think there would’ve been some benefits to separating those initiatives a little more.
Given the overall success of the project, we’re taking these learnings and applying them to other initiatives.
The idea came from the product group. We already had some basic rules for reviews, like looking for profanity, a set of drug names, or other specific words. We asked product managers who weren’t working on this tool for feedback. They came back with what turned out to be a very valuable suggestion.
Someone said, “Wouldn’t it be great if you could tell me, after I finished the sentence, that I can’t say that?” Providing feedback in real-time helps users feel like we’re being proactive and save them time. Bringing in all those different points of view and leveraging the talent we have in-house was an amazing unlock for us — it was incredibly helpful to build this solution keeping those unique perspectives in mind.
LogRocket identifies friction points in the user experience so you can make informed decisions about product and design changes that must happen to hit your goals.
With LogRocket, you can understand the scope of the issues affecting your product and prioritize the changes that need to be made. LogRocket simplifies workflows by allowing Engineering, Product, UX, and Design teams to work from the same data as you, eliminating any confusion about what needs to be done.
Get your teams on the same page — try LogRocket today.
Want to get sent new PM Leadership Spotlights when they come out?
While rapid iteration and releases provide an edge, undue haste can result in technical debt, bad user experiences, and team burnout.
Jean-Yves Simon, Chief Product Officer at AB Tasty, discusses the importance of solving a need users don’t even know they have.
Joyce Schofield, VP of Product & Design at Dispatch, shares her approach to product strategy when designing for a two-sided marketplace.
Learn more about how the leading product leaders are using the 5 Whys framework, as well as tips for you can use it within your own team.