How to Test AI-Generated Code: A Practical Checklist

You used Cursor to scaffold a new API route. Copilot filled in the database query. Claude wrote the frontend form. Everything compiles. The happy path works.

But is it actually correct?

AI-generated code has a specific failure pattern: it works for the case you described and breaks for the cases you didn't. The AI optimizes for your prompt, not for production. That means testing AI-generated code requires a different approach than testing hand-written code — what we call the vibe coding QA gap.

Here's the checklist.

The AI Code Testing Checklist

1. Check Input Validation

AI-generated code frequently skips input validation. If you prompted "create a form that saves user profiles," the AI builds the save logic but often omits:

Type checking: Does it handle numbers where strings are expected?
Length limits: What happens with a 50,000-character name?
Special characters: Does it escape HTML, SQL, and shell metacharacters?
Empty inputs: Does it handle null, undefined, empty string, and whitespace-only?
Negative numbers: If there's a quantity field, what happens with -1?

Test: Submit the form with deliberately wrong inputs. If you see a 500 error or unescaped content, the AI missed validation.

2. Verify Auth and Authorization

AI assistants generate functional code, but they don't always understand your auth architecture. Common gaps:

Missing auth middleware: the route works without a session
No workspace/tenant scoping: users can access other users' data by changing an ID in the URL
Role checks absent: any authenticated user can perform admin actions
Token validation skipped: API accepts expired or malformed tokens

Test: Hit the endpoint without authentication. Try accessing resources that belong to a different user. If either works, you have a security hole.

3. Audit Error Handling

AI-generated code tends to handle the happy path and let errors propagate as unhandled exceptions. Look for:

Try/catch coverage: Are database calls, API calls, and file operations wrapped?
Error responses: Does a failure return a user-friendly message or a raw stack trace?
Partial failure: If step 2 of 3 fails, does step 1 get rolled back?
Rate limiting: Is there protection against rapid repeated requests?

Test: Disconnect from the database (or mock a failure) and hit the endpoint. You should get a clean error response, not a crash.

4. Examine Database Queries

AI loves to write database queries that work on sample data but fail at scale or with edge cases:

SQL injection: Is the query parameterized or does it concatenate user input?
N+1 queries: Does it query inside a loop instead of batching?
Missing indexes: Will this query scan the full table in production?
Transaction boundaries: Are related writes atomic?
Column types: Does the query use the right types for comparison?

Test: Check the generated SQL manually. Look for string interpolation ($\{userId\} in a query string instead of parameterized @userId). Run an EXPLAIN on the query with production-scale data.

5. Test Concurrent Access

AI-generated code almost never handles concurrency. If two users perform the same action simultaneously:

Race conditions: Two users claim the last item in inventory
Duplicate submissions: Double-clicking a button creates two records
Stale reads: User A sees old data after User B updates it
Deadlocks: Two transactions lock the same rows in different order

Test: Open two browser tabs and perform the same action at the same time. Check if the data is consistent.

6. Validate Data Transformations

When AI moves data between formats (API response to database, form data to API, CSV to objects), check:

Field mapping: Are all fields transferred, or did some get dropped?
Type coercion: Does "123" become the number 123 or stay a string?
Timezone handling: Are dates stored in UTC and displayed in local time?
Encoding: Do special characters survive the round trip?
Null handling: Does null become "null" (the string), undefined, 0, or empty string?

Test: Create a record with edge-case values (unicode, emojis, very long strings, dates in different timezones). Read it back and compare.

7. Review API Contracts

AI-generated APIs often return inconsistent response shapes:

Success response: Does it match what the frontend expects?
Error response: Is it the same shape as success (with an error field) or a completely different structure?
Status codes: Does a validation error return 400 or 500?
Pagination: Are there limits on list endpoints, or can a request return 100K records?

Test: Call the API with valid data, invalid data, and no data. Compare the response shapes. They should be predictable.

8. Check for Hardcoded Values

AI-generated code frequently contains values that should be configurable:

URLs: http://localhost:3000 baked into production code
API keys: Test keys or placeholder secrets in source
Magic numbers: Limits, timeouts, and thresholds without explanation
Feature flags: Boolean conditions that should be environment-dependent

Test: Search the generated files for localhost, 127.0.0.1, TODO, FIXME, xxx, and any string that looks like a key or secret.

The Meta-Rule: Test What You Didn't Prompt

The most important thing to test in AI-generated code is everything you didn't explicitly ask for.

You said "build a settings page." You didn't say "validate inputs on the settings page." You didn't say "make sure the settings page requires authentication." You didn't say "handle the case where the database is down when the user saves settings."

The AI delivered what you asked for. The bugs live in what you didn't ask for.

This is why an AI QA tool is so effective for AI-generated code — it systematically tests the things you forgot to prompt for — an AI test case generator reads the code rather than your prompt.

Automate the Checklist

Running this checklist manually on every feature is possible but tedious. An AI QA tool like VibeProof automates most of these checks:

Input validation gaps are flagged as test cases
Auth and authorization holes are detected from route analysis
Error handling coverage is assessed from code structure
Database query patterns are reviewed for injection and performance
All findings come as structured test cases you can act on

The checklist above is for when you want to test manually. For everything else, let AI do the testing.