Generative AI for code: Security and Impact
With use of Generative AI rising in the field of software development how can we ensure security?
Requirements
A GitHub repository (public or private)
Basic familiarity with GitHub Actions and YAML
Docker (optional, for running tools locally)
Python 3.8+ and pip/pipx (for running Semgrep locally)
Problem Statement
AI coding assistants like GitHub Copilot, Claude Code, and ChatGPT have fundamentally changed how software is written. Developers now generate entire functions, API endpoints, and infrastructure configs from a single prompt. The speed gains are real — but so is the security cost. Independent research consistently shows that between 40% and 62% of AI-generated code contains exploitable vulnerabilities, not because the models are careless, but because they have no awareness of your application's risk model, internal standards, or threat landscape. They replicate patterns from their training data — good and bad alike. A model generating an API endpoint will produce one that accepts user input without validation, sanitization, or authorization checks, simply because the prompt didn't ask for those things. At the same time, a new class of supply chain attack called slopsquatting has emerged: models hallucinate package names that don't exist, and threat actors register those names on npm and PyPI with malicious payloads. The result is a systemic risk that grows proportionally with how much AI you use in your development workflow.
Summary
This article walks through building a practical, open source security pipeline for AI-assisted development using GitHub Actions, Semgrep, Trivy, TruffleHog, and Dependabot. It covers why GitHub's native security features alone are not sufficient, how a four-layer defense framework maps to real tools, and how to configure the full pipeline with three files that drop straight into any repository. Custom rule examples are included for teams that want to enforce organization-specific patterns on top of the standard rulesets.
The full pipeline configuration is available as three YAML files:
.github/workflows/security.yml
.github/dependabot.yml
.semgrep.yml
Scope
This article focuses on:
Why AI-generated code creates security risks that standard development workflows don't catch
Why GitHub's built-in security features alone are not enough
A four-layer defense framework and how it maps to open source tooling
Semgrep, Trivy, TruffleHog, and Dependabot — what each does and why each is necessary
A working GitHub Actions pipeline with all three config files explained
Writing custom Semgrep rules for organization-specific patterns
Alternatives and a quick comparison table
Honest advantages and disadvantages of this approach
The Security Gap in AI-Generated Code
The numbers are hard to ignore. Veracode's 2025 GenAI Code Security Report tested over 100 large language models across Java, Python, C#, and JavaScript and found that 45% of code samples failed security tests, introducing OWASP Top 10 vulnerabilities — with Java showing a 72% failure rate. A separate analysis by the Cloud Security Alliance put the figure at 62% when design flaws are included alongside known CVEs. The FormAI project, which generated over 112,000 programs with GPT-3.5, found that more than half contained at least one security vulnerability. Perhaps most telling is a Stanford HAI finding from 2024: developers using AI coding assistants report significantly higher confidence in the security of their code while simultaneously producing measurably less secure output. The confidence gap is the real danger — it erodes the human review instinct that has historically been the last line of defense.
On the standards side, OWASP published its Top 10 for LLM Applications in 2025, covering prompt injection, insecure output handling, training data poisoning, and supply chain vulnerabilities specific to AI-integrated systems. The OpenSSF followed with its Security-Focused Guide for AI Code Assistant Instructions — a set of copy-paste instruction templates for IDE-based AI assistants that embed security rules directly into the model's context. NIST's AI Risk Management Framework provides the governance layer above all of this. These are the frameworks to know; this article is about the tooling that implements them in practice.
The Four-Layer Defense Framework
No single tool covers the full attack surface. A robust security posture for AI-assisted development requires thinking in layers, where each layer catches what the one before it misses.
The four layers are:
Developer workbench — security rules embedded in the AI assistant before code is generated, IDE plugins for real-time scanning, and mandatory human review
CI pipeline gates — automated SAST, dependency scanning, IaC validation, and secrets scanning on every push and pull request
CD governance — policy as code enforcement, container signing, and supply chain attestation at deploy time
Runtime defense — canary prompts, semantic drift monitoring, and model sandboxing in production
This article focuses on Layer 2 — the CI pipeline — because it delivers the highest return for the least operational complexity and is where most teams should start.
Why GitHub's native security isn't enough on its own
GitHub Advanced Security (GHAS) — which includes CodeQL, Dependabot, and secret scanning — is genuinely useful, but it has real limitations that matter in an AI-assisted development environment.
CodeQL's deep semantic analysis is its strength, but it also makes it slow (30+ minutes on large repos), GitHub-only, and expensive for private repositories outside of an Enterprise license. More importantly, CodeQL covers only SAST — it does not scan your containers, your Terraform files, or your full git history for secrets. Dependabot watches your dependencies continuously, but it doesn't block PRs and it doesn't scan infrastructure as code. GitHub's secret scanning is good at detecting tokens in new commits but does not traverse the full git history.
The gap is not that GitHub's tools are bad — it's that they cover one or two categories of risk each. Closing the full attack surface requires combining them with tools that cover the categories they miss.
The Tools: What Each One Does and Why You Need It
Semgrep — SAST (static code analysis)
Semgrep scans your source code without running it, matching against patterns known to be insecure. It covers JavaScript, TypeScript, Python, Go, Java, Ruby, and more. Unlike CodeQL, rules are written in simple YAML — no custom query language to learn — and the public registry at semgrep.dev/r has thousands of community-maintained rules covering the OWASP Top 10, SQL injection, XSS, insecure cryptography, and more. Critically for AI-generated code, Semgrep catches the patterns that models reproduce most often: missing input validation, hardcoded credentials, insecure eval() calls, and outdated crypto functions.
Trivy — SCA + IaC scanning
Trivy covers two distinct categories. As an SCA tool, it scans your package.json, requirements.txt, and other dependency files against the NVD and GitHub Advisory Database, catching known CVEs in the libraries you import. As an IaC scanner, it reads your Terraform and Kubernetes YAML files for misconfigurations — missing security groups, overly permissive IAM roles, Dockerfiles running as root. This second capability is where Trivy goes beyond anything GitHub's native tooling offers. In a real-world test against the infraglue repository, Trivy immediately caught a Dockerfile running its application process as root and a missing HEALTHCHECK instruction — both introduced via AI-generated infrastructure code.
Dependabot — continuous dependency monitoring
Dependabot watches your dependency files 24 hours a day and opens automatic pull requests the moment a new CVE is published for any package you use — even when no code has been pushed. This is the critical gap that point-in-time scanners like Trivy cannot fill. A vulnerability published on Tuesday night will not be caught by Trivy until the next code push; Dependabot will open a fix PR by Wednesday morning. In the same infraglue test, Dependabot surfaced 18 vulnerabilities (11 high, 6 moderate, 1 low) in the existing codebase the moment dependabot.yml was pushed — vulnerabilities that had been sitting undetected in the repository.
TruffleHog — secrets scanning
TruffleHog scans your full git history for leaked secrets — API keys, tokens, passwords, and other credentials. The important detail is fetch-depth: 0 in the checkout step: without it, GitHub Actions only downloads the latest commit, meaning a secret committed two weeks ago and "deleted" in a later commit would never be found. TruffleHog with full history traversal catches exactly that scenario.
Note on Gitleaks: Gitleaks is a popular alternative to TruffleHog but recently introduced a paid license requirement for its GitHub Action. TruffleHog is fully open source with no license restrictions and is the recommended default for new setups.
A Working Pipeline: The Three Files
Where the files go
your-repo/├── .github/│ ├── workflows/│ │ └── security.yml ← the main pipeline│ └── dependabot.yml ← continuous dependency monitoring└── .semgrep.yml ← semgrep config and ignore rulesStart by creating the folder structure. GitHub Actions will only pick up workflow files from exactly this path — it won't look anywhere else.
bash
mkdir -p .github/workflowsFile 1 — .github/workflows/security.yml
This is the main pipeline. It runs three jobs in parallel — Semgrep, Trivy, and TruffleHog — on every push, every pull request, and once daily at 2am UTC. The daily schedule is what catches new CVEs between code pushes, even when no one has committed anything.
bash
cat > .github/workflows/security.yml << 'EOF'name: Security Scanon: push: branches: ["main", "master", "develop"] pull_request: branches: ["main", "master", "develop"] schedule: - cron: "0 2 * * *" # daily at 2am UTC — catches new CVEs between pushesjobs: semgrep: name: Semgrep SAST runs-on: ubuntu-latest container: image: semgrep/semgrep steps: - uses: actions/checkout@v4 - name: Run Semgrep run: | semgrep scan \ --config "p/javascript" \ --config "p/typescript" \ --config "p/python" \ --config "p/owasp-top-ten" \ --config "p/secrets" \ --config "p/sql-injection" \ --config "p/xss" - uses: github/codeql-action/upload-sarif@v4 if: always() with: sarif_file: semgrep.sarif category: semgrep trivy: name: Trivy SCA + IaC runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Trivy — dependency scan uses: aquasecurity/trivy-action@master with: scan-type: "fs" scan-ref: "." scanners: "vuln" severity: "HIGH,CRITICAL" format: "sarif" output: "trivy-deps.sarif" exit-code: "0" - uses: github/codeql-action/upload-sarif@v4 if: always() with: sarif_file: trivy-deps.sarif category: trivy-deps - name: Trivy — IaC / HCL scan uses: aquasecurity/trivy-action@master with: scan-type: "fs" scan-ref: "." scanners: "misconfig" severity: "HIGH,CRITICAL" format: "sarif" output: "trivy-iac.sarif" exit-code: "0" - uses: github/codeql-action/upload-sarif@v4 if: always() with: sarif_file: trivy-iac.sarif category: trivy-iac trufflehog: name: TruffleHog secrets scan runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - name: Run TruffleHog uses: trufflesecurity/trufflehog@main with: path: ./ base: ${{ github.event.repository.default_branch }} extra_args: --only-verifiedEOFFile 2 — .github/dependabot.yml
This runs entirely in the background — no push needed. GitHub reads it and starts monitoring your npm, pip, Terraform, and GitHub Actions dependencies 24/7, opening automatic fix PRs the moment a new CVE is published for anything you use.
bash
cat > .github/dependabot.yml << 'EOF'version: 2updates: - package-ecosystem: "npm" directory: "/" schedule: interval: "daily" open-pull-requests-limit: 10 labels: ["security", "dependencies"] - package-ecosystem: "pip" directory: "/" schedule: interval: "daily" open-pull-requests-limit: 10 labels: ["security", "dependencies"] - package-ecosystem: "terraform" directory: "/" schedule: interval: "weekly" labels: ["security", "infrastructure"] - package-ecosystem: "github-actions" directory: "/" schedule: interval: "weekly" labels: ["security", "ci"]EOFFile 3 — .semgrep.yml
Tells Semgrep which directories to skip. Without this file, Semgrep would scan node_modules, .terraform, and other generated directories — thousands of files you didn't write — and flood your results with noise. This is also where you add custom rules later.
bash
cat > .semgrep.yml << 'EOF'rules: []paths: exclude: - "node_modules/**" - ".venv/**" - "venv/**" - "dist/**" - "build/**" - "**/*.min.js" - "**/__pycache__/**" - ".terraform/**"EOFWhat each file does
security.yml runs three jobs in parallel on every push, pull request, and daily at 2am UTC:
semgrep— scans JS/TS and Python source for OWASP Top 10, XSS, SQL injection, and secrets patterns. Results upload to GitHub's Security → Code scanning tab via SARIF.trivy— runs two scans: one for dependency CVEs (scanners: "vuln"), one for IaC misconfigurations (scanners: "misconfig"). Both upload to the Security tab.trufflehog— scans the full git history (fetch-depth: 0) for verified leaked secrets.
The schedule: cron: "0 2 * * *" line is what closes the gap between pushes — Trivy downloads a fresh vulnerability database every time it runs, so even with no code changes, new CVEs published overnight are caught by 2am the next morning.
dependabot.yml runs continuously in the background, independent of any code push. It monitors npm, pip, Terraform providers, and GitHub Actions dependencies and opens automatic PRs the moment a new CVE is published for anything you use.
.semgrep.yml tells Semgrep which directories to skip. Without it, Semgrep would scan node_modules and .terraform — thousands of files you didn't write — and generate noise instead of signal.
Writing Custom Semgrep Rules
The public Semgrep registry covers common vulnerability patterns, but every codebase has organization-specific patterns worth enforcing. Custom rules go in .semgrep.yml and are written in YAML — no special query language required.
Example 1 — Block hardcoded staging URLs
Catches staging URLs that should only ever come from environment variables, not be hardcoded in source.
rules: - id: no-hardcoded-staging-url patterns: - pattern: | "$URL" metavariable-regex: metavariable: $URL regex: 'https?://staging\.' message: > Hardcoded staging URL detected. Use an environment variable instead: process.env.API_URL or os.environ['API_URL'] languages: [javascript, typescript, python] severity: WARNINGExample 2 — Enforce parameterized queries in Python
Catches the pattern of string-concatenating SQL queries, which AI models frequently generate.
- id: no-string-concat-sql patterns: - pattern: | $CURSOR.execute("..." + ...) - pattern: | $CURSOR.execute(f"...") message: > SQL query built with string concatenation or f-string. Use parameterized queries: cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,)) languages: [python] severity: ERRORExample 3 — Flag eval() on any user input
Catches direct or indirect use of eval() with variables that may originate from user input.
- id: no-eval-user-input patterns: - pattern: eval($VAR) - pattern: eval(request.$ANYTHING) message: > eval() on user-controlled input enables remote code execution. Refactor to avoid eval entirely, or use a safe expression parser. languages: [javascript, typescript, python] severity: ERRORTo test a custom rule before committing it:
semgrep scan --config .semgrep.yml --test .Alternatives and Quick Comparison
The recommended open source stack for most teams: Semgrep + Trivy + TruffleHog + Dependabot. Together they cover SAST, SCA, IaC, secrets, and continuous dependency monitoring with zero licensing cost.
Advantages and Disadvantages
Advantages
Fully open source — zero licensing cost for the core stack
Platform agnostic — Semgrep and Trivy run anywhere; not locked to GitHub
Covers what GitHub misses — containers, IaC, full git history, continuous monitoring in one pipeline
Results in one place — all findings surface in GitHub's Security tab via SARIF upload
Catches real issues immediately — in testing against a real-world repository, Dependabot surfaced 18 previously undetected vulnerabilities (11 high) on first run; Trivy caught Dockerfile misconfigurations introduced by AI-generated infrastructure code
Daily cron closes the CVE gap — new vulnerabilities are caught overnight even with no code changes
Custom rules — Semgrep's YAML rule format makes it straightforward to encode organization-specific patterns
Disadvantages
Exit code tuning required — Semgrep's
--errorflag and Trivy'sexit-code: "1"will fail the pipeline before SARIF is uploaded unless configured carefully. Setexit-code: "0"on Trivy and remove--errorfrom Semgrep; let GitHub's Security tab handle alert management.Semgrep is static only — it does not catch runtime issues or vulnerabilities in running applications (DAST). Add OWASP ZAP for dynamic coverage.
Trivy does not cover vendored dependencies — packages copied directly into the repo rather than declared in a manifest file are invisible to Trivy.
Dependabot opens a lot of PRs —
open-pull-requests-limit: 10in the config caps this, but on large dependency trees it can still generate noise. Grouping patch updates helps.TruffleHog only flags verified secrets — the
--only-verifiedflag reduces false positives but means some unverified leaks won't appear. Remove the flag if you want maximum coverage at the cost of more noise.Full git history scan is slow —
fetch-depth: 0on large repositories with long histories adds meaningful time to the TruffleHog job.
Conclusion
AI-assisted development is not going away, and neither is the security risk it introduces. The research is clear: models generate vulnerable code at a rate that manual review alone cannot keep pace with, and developer confidence in AI output actively reduces the scrutiny that has historically caught these issues.
The good news is that the tooling exists, it is largely open source, and it can be dropped into any GitHub repository in minutes. The four-layer defense framework maps the problem space; this pipeline implements Layer 2 — the CI gates — which is where most of the return is. Semgrep catches vulnerable code patterns before they merge. Trivy catches vulnerable dependencies and misconfigured infrastructure. TruffleHog catches secrets in the commit history. Dependabot watches for new CVEs around the clock.
None of these tools requires AI to be effective against AI-generated vulnerabilities. They work on all code equally — which is precisely the point. The developer remains responsible for what ships. These tools are what make that responsibility tractable at machine speed.
References
Cloud Security Alliance. Understanding Security Risks in AI-Generated Code. July 2025. https://cloudsecurityalliance.org
Veracode. 2025 GenAI Code Security Report. September 2025. https://veracode.com
Pearce, H. et al. Asleep at the Keyboard? Assessing the Security of GitHub Copilot Code Contributions. IEEE S&P, 2022.
ANSSI / BSI. AI Coding Assistants — Security Considerations. Joint Report, 2024.
FormAI Project. A Comprehensive Study of LLM Secure Code Generation. 2024.
Stanford HAI. The Confidence Gap in AI-Assisted Coding. 2024.
OWASP GenAI Security Project. OWASP Top 10 for LLM Applications 2025. https://genai.owasp.org
OWASP. Top 10 for Agentic AI Applications. 2025. https://genai.owasp.org
OpenSSF Best Practices Working Group. Security-Focused Guide for AI Code Assistant Instructions. August 2025. https://best.openssf.org
OpenSSF / Linux Foundation. Visualizing Secure MLOps: A Practical Guide for Building Robust AI/ML Pipeline Security. August 2025.
NIST. AI Risk Management Framework (AI RMF 1.0). NIST AI 100-1, 2023. https://nist.gov/artificial-intelligence
DORA Research Program. State of DevOps Report 2025. Google Cloud, 2025.
Toulas, B. AI-hallucinated code dependencies become new supply chain risk. BleepingComputer, 2025.
Dai, S., Xu, J., Tao, G. A Comprehensive Study of LLM Secure Code Generation. IEEE Security & Privacy, 2024.
Join the conversation
Comments and reactions are powered by GitHub (USA) via Giscus. Loading them connects your browser to GitHub servers.