AI Fundamentals. Part 6. Primary models of failure in AI
In this part of the Lecture, Pavel Spesivtsev outlines five primary models of failure in AI and automation projects: Hallucination: While often viewed negatively, hallucination allows AI to deviate from typical patterns to create novel ideas, which is beneficial for tasks like design, brainstorming, or identifying gaps in existing patterns. It becomes a significant failure point in domains requiring strict accuracy, such as legal, medical, or rule-based workflows. Mitigation involves limiting creativity and grounding responses in established facts, playbooks, or rulebooks when accuracy is required. Context Overflow: Failure occurs when models are overloaded with excessive, irrelevant information rather than only the specific, explicit context required for a task. Quality degrades when too much information is placed in the context window. Effective management requires organizing knowledge into targeted chunks and strategically feeding the model only what is necessary for the current workload. Security Breaches: A major vulnerability is "prompt injection," where malicious actors use crafted inputs to override system instructions and hijack operations. As of now, there is no completely bulletproof way to protect systems from these attacks. Additional concerns include data poisoning, where training data—particularly in proprietary or restricted models—may contain hidden malicious triggers. Stale Knowledge: Foundational large language models are "frozen in time" based on when their training data was collected, making them unaware of recent real-world changes. To prevent incorrect or outdated recommendations, systems should implement grounding mechanics, such as access to the internet or up-to-date knowledge bases via Retrieval-Augmented Generation (RAG). Sycophancy: This is an intrinsic nature of human-trained models caused by the reinforcement learning process, where humans tend to reward responses that are polite, gentle, and agreeable. Models are effectively optimized to please the user and agree with them, rather than question the input or remain objective. Users should remain cautious when an AI offers praise or unconditional agreement, and they should implement processes to verify that responses are grounded in facts rather than false confidence.