Obsessive compulsive disorder (OCD) is a serious and disabling chronic psychological disorder. Reinforcement learning, as a commonly used model in the field of computational psychiatry, can not only explain the pathological mechanisms underlying the onset and maintenance of OCD symptoms, but also provide potential targets for developing new therapies. This study examined the performance of probability reversal learning task (PRL) in 32 patients with OCD and 35 healthy individuals under conditions of self-awareness initiation and control. The results showed that during the overall task process, both OCD patients and the healthy control group experienced a decrease in accuracy and an increase in reaction time during the probability reversal stage, but only the healthy control group had a decrease in the inverse temperature parameter and an increase in the persistence parameter. In the probability reversal stage, the inverse temperature parameter of OCD patients was significantly higher than that of the healthy control group, indicating that OCD patients exhibited more exploratory behavior. Under the condition of self-awareness initiation, the inverse temperature parameter of the OCD group was significantly lower than that under the control condition, indicating that self-awareness activation effectively improved the reinforcement learning performance of OCD patients. This study finds that the reinforcement learning difference between OCD patients and healthy individuals stems from OCD patients' enhanced behavioral exploration in response to environmental changes, and the initiation of self-awareness can improve the reinforcement performance of OCD patients. This study innovatively proposed a self-awareness training program for OCD patients, which provides insights for the intervention of OCD.