Building an AI-Powered Smart Contract Security Auditor: From Fine-Tuning to Deployment

Parsaroohi5 min read·Just now

How I fine-tuned a 7B LLM on the SmartBugs dataset and deployed a fully functional Solidity vulnerability detector — completely for free.

The Problem

Smart contract security is one of the most critical challenges in the blockchain industry. Since 2016, over $3 billion has been lost to smart contract exploits — reentrancy attacks, integer overflows, access control flaws. Most of these vulnerabilities are well-known, well-documented, and yet developers keep shipping them to production.

The traditional solution is manual auditing — expensive, slow, and not scalable. Tools like Slither and MythX help, but they’re either purely rule-based or require paid subscriptions.

I wanted to build something different: an AI auditor that understands code semantically, not just through regex patterns.

The Architecture

The system has three layers working together:

[Solidity Code]
       │
       ▼
[Pattern Analysis]  ←── Fast, always-on regex layer
       │
       ▼
[LLM Analysis]  ←── Fine-tuned DeepSeek-Coder-7B
       │
       ▼
[Audit Report]  ←── Risk level, vulnerabilities, recommendations

Why two layers? Pattern analysis is instant and reliable for known vulnerability signatures. The LLM adds semantic understanding — it can reason about why something is vulnerable and suggest specific fixes. The combination is more robust than either alone.

The Dataset: SmartBugs

I used the SmartBugs Curated Dataset — a collection of vulnerable Solidity contracts annotated with vulnerability categories. It contains contracts with real-world vulnerabilities including:

Reentrancy (like the DAO hack)
Integer overflow/underflow
Unchecked external calls
Access control issues
Timestamp dependence
tx.origin authorization

For each contract, I built a training example pairing the raw Solidity code with a structured audit report generated by the pattern analyzer. This gave me ~2,000 labeled examples.

def build_prompt(code, findings):
    if findings:
        report = '\n'.join(f"- [{f['severity']}] {f['name']}" 
                           for f in findings)
        answer = (
            f"## Security Audit Report\n\n"
            f"⚠️ Vulnerabilities detected:\n{report}\n\n"
            f"Please review each finding and apply the recommended "
            f"fixes before deploying to production."
        )
    return {'input': f"Analyze this contract:\n```solidity\n{code}\n```",
            'output': answer}

Fine-Tuning with LoRA

I chose DeepSeek-Coder-7B-Instruct as the base model. It’s specifically trained on code, understands Solidity syntax, and at 7B parameters fits comfortably in Google Colab Pro’s A100 GPU with 4-bit quantization.

LoRA (Low-Rank Adaptation) made fine-tuning feasible on a single GPU. Instead of updating all 7 billion parameters, LoRA adds small trainable matrices to the attention layers — reducing trainable parameters from 7B to just ~10M.

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=['q_proj', 'v_proj', 'k_proj', 'o_proj'],
    lora_dropout=0.05,
    task_type=TaskType.CAUSAL_LM,
)

Training took about 25 minutes on an A100 GPU with these settings:

5 epochs
Batch size 4 with gradient accumulation
bf16 precision (A100 native)
Cosine learning rate schedule

The Critical Step: Merging LoRA Weights

This is where I hit my first major issue. After training, I pushed the LoRA adapter to Hugging Face Hub — but the HF Inference API returned a 404: Cannot POST /models/....

The reason: LoRA adapters are differential weights — they require the base model to be loaded first. HF Inference API can’t handle this automatically for custom models.

The solution is to merge the adapter into the base model before pushing:

# ❌ What I pushed first (doesn't work with Inference API):
# adapter_config.json + adapter_model.safetensors

# ✅ What you need to push (complete merged model):
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="cpu",  # CPU to avoid OOM during merge
)
model_with_lora = PeftModel.from_pretrained(base_model, checkpoint_path)
merged_model = model_with_lora.merge_and_unload()  # ← key step
merged_model.push_to_hub(repo_id)

Important: Do this on CPU, not GPU. After training, the GPU is nearly full. Loading the base model again (in fp16, without 4-bit quantization) on the same GPU causes OOM. Loading on CPU uses RAM instead.

Deployment: Fully Free Stack

Getting to zero cost required some creativity:

Training → Google Colab Pro

A100 GPU, ~25 minutes per training run. Not permanently free but affordable.

Model Storage → Hugging Face Hub

Free unlimited storage for public models.

API + UI → Hugging Face Spaces

This is where it gets interesting. HF Spaces provides free CPU containers. The 7B model with 4-bit quantization (~4GB) loads and runs inference on CPU — slowly (~3–5 minutes per request on CPU basic), but it works.

For production speed, I upgraded to a paid GPU Space, but the CPU version is fully functional for demos.

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    HF_MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    low_cpu_mem_usage=True,
)

Lessons Learned

1. Always merge LoRA before deploying The adapter-only approach works for local inference where you control the environment, but any production API expects a complete model.

2. Free GPU tiers have hard memory limits Colab Pro’s A100 has 40GB, which sounds like a lot until you’re loading a 13B model twice (once for training, once for merging). Always del model; torch.cuda.empty_cache() before the merge step.

3. Pattern analysis is underrated The regex-based layer catches ~80% of common vulnerabilities instantly, with zero latency and zero cost. The LLM adds value for explaining why something is vulnerable and suggesting fixes, but don’t underestimate simple pattern matching.

4. 7B > 13B for deployment CodeLlama-13B produced slightly better audit reports, but the deployment friction was not worth it. DeepSeek-Coder-7B hits a sweet spot of quality, speed, and deployability.

Results

The deployed system can detect:

╔════════════════════════╦════════════════════╦════════════════╗
║   Vulnerability        ║       Severity     ║Detection Method║
╠════════════════════════╬════════════════════╬════════════════╣
║ Reentrancy             ║      🚨 HIGH       ║ Pattern + LLM  ║
║ Integer Overflow       ║      🚨 HIGH       ║ Pattern + LLM  ║
║ Unchecked Call Returns ║      ⚠️ MEDIUM     ║ Pattern + LLM  ║
║ tx.origin Authorization║      ⚠️ MEDIUM     ║ Pattern + LLM  ║
║ Missing Access Control ║      ⚠️ MEDIUM     ║ Pattern + LLM  ║
║ Timestamp Dependence   ║      ℹ️ LOW        ║ Pattern + LLM  ║
╚════════════════════════╩════════════════════╩════════════════╝

For the classic reentrancy contract (VulnerableBank), the model correctly identifies the vulnerability and outputs:

“The withdraw function sends ETH before updating the balance. An attacker can recursively call withdraw() before the balance is decremented, draining the contract. Use the checks-effects-interactions pattern and consider OpenZeppelin’s ReentrancyGuard.”

Try It Yourself

The full project is available on Hugging Face Spaces: https://parsa2025ai-auditagent.hf.space

The complete codebase — training notebook, FastAPI backend, and frontend — is modular and reusable. All free to use.

What’s Next

Expanding the training dataset with more recent exploits (Euler Finance, Ronin Bridge)
Adding support for multi-file contract analysis
Integrating with Foundry and Hardhat as a pre-deployment hook
Exploring smaller, faster models (Phi-3, Qwen2.5-Coder-1.5B) for lower latency on CPU

If you found this useful, the model is at huggingface.co/Parsa2025AI/smart-contract-auditor. Feedback and contributions welcome.

Building an AI-Powered Smart Contract Security Auditor: From Fine-Tuning to Deployment

Building an AI-Powered Smart Contract Security Auditor: From Fine-Tuning to Deployment

The Problem

The Architecture

The Dataset: SmartBugs

Fine-Tuning with LoRA

The Critical Step: Merging LoRA Weights

Deployment: Fully Free Stack

Training → Google Colab Pro

Model Storage → Hugging Face Hub

API + UI → Hugging Face Spaces

Lessons Learned

Results

Try It Yourself

What’s Next

NexaPay — Accept Card Payments, Receive Crypto

Related Articles

Emotional Slippage: The Silent Profit Killer

Labor Unions Join Banking Industry in Opposition to Senate Crypto Bill, The Clarity Act

InterLink: Building the Future of Real-World Blockchain Infrastructure

Poland lawmakers debate crypto bills as PiS party submits separate ban proposal

UAE government approves crypto for fee payments, taps Crypto.com as first licensed platform

How to Read CPI Like a Trader, Not a Spectator