Generative AI and Social Engineering Threats
Generative AI has materially expanded the technical capability available to threat actors conducting social engineering attacks, compressing the skill threshold required to produce convincing deceptive content at scale. This page covers the definition and scope of generative AI-enabled social engineering, the mechanisms by which these attacks operate, the primary attack scenarios active across US organizational environments, and the decision boundaries that distinguish AI-assisted from AI-generated threat categories. The intersection of large language models, synthetic media, and classical manipulation tradecraft represents one of the most structurally significant shifts in the applied AI Cyber Listings threat landscape.
Definition and scope
Social engineering in cybersecurity refers to manipulation techniques that exploit human psychology rather than technical vulnerabilities to obtain unauthorized access, credentials, or sensitive information. Generative AI enters this domain as a force multiplier: it does not create new attack categories but removes the effort, language, and personalization barriers that previously constrained attack volume and believability.
The scope of this threat covers large language model (LLM)-generated phishing and spear-phishing content, voice synthesis-based vishing (voice phishing), deepfake video for impersonation fraud, and automated persona construction for long-duration social manipulation. The Cybersecurity and Infrastructure Security Agency (CISA) has identified AI-enhanced phishing and deepfake-enabled fraud among the priority threat vectors in its 2023–2025 Cybersecurity Strategic Plan, placing these threats within national critical infrastructure risk frameworks.
The National Institute of Standards and Technology (NIST) addresses AI-specific risks in the AI Risk Management Framework (AI RMF 1.0), which classifies adversarial manipulation of AI outputs and AI-assisted deception under the "Trustworthiness" and "Security" risk categories. These classification boundaries provide the regulatory vocabulary used across federal agency assessments of generative AI threats.
How it works
Generative AI-enabled social engineering operates through a pipeline that can be broken into four discrete phases:
-
Reconnaissance and targeting. Open-source intelligence (OSINT) collection identifies targets, organizational hierarchies, communication styles, and relationships. AI tools automate aggregation of publicly available data from LinkedIn, corporate websites, and social platforms to build target profiles at scale.
-
Content generation. LLMs produce phishing emails, SMS lures (smishing), and script-based vishing calls calibrated to the target's professional context, language register, and organizational role. Unlike template-based attacks, LLM-generated content does not replicate detectable boilerplate patterns and passes standard grammar and tone filters.
-
Synthetic media production. Voice cloning models trained on publicly available audio (earnings calls, interviews, public speeches) generate real-time or pre-recorded audio impersonating executives or trusted contacts. Deepfake video synthesis extends this to visual impersonation in video conferencing contexts. The FBI's Internet Crime Complaint Center (IC3) has issued public notices specifically addressing CEO fraud and BEC (Business Email Compromise) augmented by voice synthesis — documented in the IC3 Annual Report, which recorded $2.9 billion in BEC losses in 2023.
-
Delivery and exploitation. Generated content is deployed through standard attack channels — email, SMS, phone, or video — with AI tools also capable of maintaining conversational coherence across multi-turn interactions, extending the window of deception beyond single-message attacks.
The contrast with traditional social engineering is structural: pre-AI phishing required per-target manual effort, limiting scale. LLM-assisted campaigns reduce per-target marginal cost to near zero, enabling simultaneous high-personalization attacks across thousands of individuals.
Common scenarios
The primary generative AI social engineering scenarios active in the US threat environment include:
- Executive impersonation via voice synthesis. Fraudulent audio calls or voice messages replicating a CFO or CEO directing wire transfers or credential sharing. IC3 BEC reports categorize this as a sub-variant of wire fraud under 18 U.S.C. § 1343.
- AI-generated spear-phishing. Emails that reference accurate internal project names, recent organizational events, or correct reporting relationships — details synthesized from OSINT — to defeat skepticism filters trained on generic lure recognition.
- Synthetic persona development. Long-duration LinkedIn or email relationships built by AI-maintained personas posing as vendors, recruiters, or researchers, designed to extract sensitive documents or internal access credentials over weeks.
- Deepfake video conferencing. Real-time or pre-recorded video impersonation used in onboarding fraud, vendor verification bypasses, or internal authorization requests. The Federal Trade Commission (FTC) has explicitly addressed voice cloning and AI impersonation as deceptive practices subject to Section 5 of the FTC Act.
For an overview of how AI-enabled threats are categorized within the broader cybersecurity services sector, the AI Cyber Directory Purpose and Scope describes the organizational structure applied across this reference environment.
Decision boundaries
Practitioners and researchers categorizing these threats apply two primary classification distinctions:
AI-assisted vs. AI-generated: AI-assisted attacks use generative tools to improve or refine human-authored content. AI-generated attacks produce the full deceptive artifact autonomously, including personalization, tone calibration, and structural coherence. The operational distinction matters for detection methodology — AI-assisted content retains partial human stylistic signatures; AI-generated content does not.
Shallow vs. deep synthetic media: Shallow synthesis covers text and audio generation, which require minimal compute and are broadly accessible. Deep synthesis covers photorealistic video and real-time avatar generation, which require greater infrastructure but are commercially available through consumer-grade tools as of 2024. NIST's Faces in the Wild Evaluation documents the accuracy trajectory of facial recognition and synthetic face detection benchmarks relevant to deepfake threat assessment.
The How to Use This AI Cyber Resource page describes how practitioners can navigate threat category listings within this reference environment. Organizations assessing exposure to generative AI social engineering threats operate under overlapping regulatory frameworks including CISA's Binding Operational Directives, FTC Act Section 5 deceptive practices provisions, and sector-specific guidance from the Financial Crimes Enforcement Network (FinCEN) on BEC and wire fraud.
References
- CISA FY2024–2026 Strategic Plan — Cybersecurity and Infrastructure Security Agency
- NIST AI Risk Management Framework (AI RMF 1.0) — National Institute of Standards and Technology
- IC3 2023 Annual Report — FBI Internet Crime Complaint Center
- FTC on Voice Cloning and AI Deception — Federal Trade Commission
- NIST Faces in the Wild Evaluation — NIST Face Recognition Vendor Testing
- FinCEN Advisory on BEC — Financial Crimes Enforcement Network