Emotional information can influence cognitive processing across sensory modalities, shaping how individuals perceive and evaluate social cues. However, the extent to which the instructional focus modulates this cross-modal interference remains unclear. Building upon previous findings on affective auditory and visual interactions, this study investigates how attentional orientation modulates emotional interference by manipulating the instruction modality (visual vs. auditory) in affective evaluation, decision-making, and recognition tasks. Thirty-two undergraduate students (aged 18-27) were simultaneously presented with 24 neutral faces paired with affective auditory stimuli (positive, negative, or neutral). Participants rated the emotional content of the stimuli on a 5-point Likert scale (1= very negative, 5= very positive) and responded to social decision-making questions (e.g., "Would you be friends with this person?") while instructed to focus on either the auditory or facial stimuli. After 12 stimuli in the first block, the instruction type changed, counterbalanced across blocks (visual-auditory-focused vs. auditory-visual-focused). In a subsequent recognition task, participants judged whether previously presented faces were "old" or "new." Emotion evaluation scores were higher under visual than auditory instruction across all categories (p < .05). Neutral faces paired with negative auditory stimuli received the lowest ratings, with a greater reduction under auditory instruction (p < .05). Although affective auditory stimuli influenced decision-making, this effect did not significantly differ between instruction conditions (p > .05). Recognition accuracy was significantly higher under visual instruction across all categories (p < .05). These findings suggest that the disruptive effect of emotional valence on cognitive performance is instruction dependent. When attention is directed toward the auditory modality, affective cues exert cross-modal interference on visual memory, whereas visual focus provides relative protection from this interference. These results provide insights into how attentional focus modulates emotional interference and may have implications for instruction-based approaches to supporting attentional control and reducing emotion-driven disruptions in cognition.