NashTech Blog

Integrating AI With Automated Reconnaissance Penetration Testing

Table of Contents

With the advancement of technology over the years, a blended approach of AI based algorithms with reconnaissance frameworks is transforming how penetration testers collect and evaluate information around targets as well as vulnerabilities. This integration positively impacts the automated reconnaissance phase, which also happens to be the first step of ethical hacking engagements.

Traditional penetration testing philosophies rely on accuracy, which risks dramatically underestimating the time-consuming activities one can do outside of work.

The role of AI in guiding tools has digitally changed the landscape of traditional penetration testing, where now the first step can be automated. Taking one of the most innovative steps towards improving reconnaissance, the use of AI aims towards ensuring that every detail is taken into account.

Understanding Reconnaissance in Penetration Testing

Reconnaissance is understood as the initial phase of gathering information on potential domains and for the purpose of developing an offensive cybersecurity strategy, is best defined as preparation for a planned security breach.

The precept of gathering information on domains actively engages clients on various levels, for example, Domain, subdomain, IP address and geolocation registries, weather and available open ports, DNS conceals, Web server details, accolades of employees and wherever else credentials may be found – basically everything remotely available on the internet.

As of today, reconnaissance is available semi-remotely using Nmap, Amass, WhatWeb, theHarvester, Shodan and others. The challenge with the tools is that it does capture vital information, however, limited knowhow in utilizing AI on big data leads to predefined actionable outputs.

Why Integrate AI With Automated Recon?

AI can enhance the effectiveness of automated reconnaissance in several impactful ways:

1. Correlation and Prioritization of Data

Tools that conduct reconnaissance create enormous amounts of data. AI models have the ability to correlate data across multiple tools within seconds and suggest valuable patterns. Rather than examine hundreds of IPs manually, an AI model could flag which IPs have a high likelihood of finding vulnerabilities based on available threat intel and historical actions.

2. Adaptive Recon Tactics

AI facilitates dynamic decision-making. If an AI engine identifies that a target is using old CMS software, it can switch to utilizing specialized tools for CMS-specific vulnerabilities. This dynamic action dramatically minimizes blind spots.

3. Machine Learning for Anomaly Detection

AI can be taught to detect unusual configurations, misconfigured DNS records, open sensitive files (e.g.,.env,.git), and even detect honeypots. These features are more advanced than static checks by learning from actual attack datasets.

5. Threat Scoring and Reporting

AI can generate threat scores automatically for assets found, assisting penetration testers in prioritizing their work. It can also auto-generate initial reports, eliminating hours of documentation work.

Challenges and Considerations

As great as the power of AI is, there are difficulties:

  • False Positives/Negatives: ML models might incorrectly classify vulnerabilities with inadequate training data.
  • Data Privacy: OSINT collection has to honor legal and ethical limits.
  • Model Maintenance: Ongoing retraining and validation are required to maintain models as effective.
  • Tool Compatibility:  Smooth integration of AI with conventional tools can be technically challenging.

Integrate Gemini AI To Our Recon Script

For this post, we will integrate Gemini AI, Google’s multimodal large language model with our recon script. Integrating Gemini into your recon script allows you to automate decision-making, evaluate tool outputs in context, and even conduct natural language-based OSINT — all in real-time.

This integration makes your recon script an intelligent assistant, able to recommend attack vectors, summarize revealed risks, and correlate threat intelligence from sources.

Steps To Integrate Gemini AI With Your Script

Pre-requisites:

1. Python 3.9+
2.  Google’s Gemini API access through Google Cloud
3.  A current recon script using tools such as Amass, Nmap, or Nuclei
4. google-generativeai SDK installed (pip install google-generativeai)

Step 1: Set Up Gemini AI Access

Sign in to Google AI Studio.
Generate an API key from the API Access section.
Turn on the Generative Language API from your Google Cloud Console.

Step 2: Install Gemini SDK
pip install google-generativeai

Step 3: Import and Authenticate in Your Script

Let us import the authenticate the gemini API with your API key

import google.generativeai as genai
genai.configure(api_key="YOUR_GEMINI_API_KEY")
model = genai.GenerativeModel('gemini-2.0-flash') 

Step 4: Analyze Recon Tool Output With Gemini
def analyze_with_gemini(recon_data, api_key):
    try:
        genai.configure(api_key=api_key)
        model = genai.GenerativeModel("gemini-2.0-flash")

        prompt = f"""
        Analyze the following reconnaissance JSON report for the domain `{recon_data.get("domain")}`:

        {json.dumps(recon_data, indent=2)}

        Please provide:
        1. Key positive findings (what looks secure or well-configured)
        2. Key concerns or risks identified
        3. Specific actionable next steps to improve security posture

        Be concise and structured.
        """

        response = model.generate_content(prompt)
        return response.text

    except Exception as e:
        return f"AI Analysis Error: {str(e)}"

After integrating Gemini AI in our script, the final reconnaissance script will look something like this:

import subprocess
import json
import re
import whois
import os
import tempfile
import uuid
from datetime import datetime
import google.generativeai as genai

def clean_output(output):
    # Remove ANSI escape sequences (like \u001b[32m)
    ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
    return ansi_escape.sub('', output)

# Ask user for the target domain
DOMAIN = input("Enter the target domain: ").strip()

# Function to perform WHOIS lookup
def get_whois_info(domain):
    try:
        w = whois.whois(domain)
        return {
            "domain_name": w.domain_name,
            "registrar": w.registrar,
            "creation_date": str(w.creation_date),
            "expiration_date": str(w.expiration_date),
            "name_servers": w.name_servers
        }
    except Exception as e:
        return {"error": str(e)}

def run_nmap_scan(domain):
    try:
        print("[*] Running Nmap scan...")
        result = subprocess.run(["nmap", "--top-ports", "1000", domain], capture_output=True, text=True)
        cleaned = clean_output(result.stdout)
        return cleaned.splitlines()
    except Exception as e:
        return str(e)

def run_sslscan(domain):
    try:
        print("[*] Running SSLScan...")
        result = subprocess.run(["sslscan", domain], capture_output=True, text=True)
        cleaned = clean_output(result.stdout)
        return cleaned.splitlines()
    except Exception as e:
        return str(e)

def run_web_fingerprint(domain):
    try:
        print("[*] Running Web Fingerprinting...")
        result = subprocess.run(["whatweb", domain], capture_output=True, text=True)
        cleaned = clean_output(result.stdout)
        return cleaned.splitlines()
    except Exception as e:
        return {"error": str(e)}

def run_dirsearch(domain):
    try:
        print("[*] Running Dirsearch...")
        result = subprocess.run(["dirsearch", "-u", f"https://{domain}"], capture_output=True, text=True)
        return extract_dirsearch_results(result.stdout)
    except Exception as e:
        return {"error": str(e)}

def extract_dirsearch_results(output):
    findings = []
    for line in output.split("\n"):
        if re.search(r"\[\d{3}\]\s+=>", line):
            findings.append(line.strip())
    return findings if findings else ["No directories/files found."]

def clean_ansi_codes(text):
    ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
    return ansi_escape.sub('', text)    


def run_amass(domain):
    try:
        print("[*] Running Amass ...")
        result = subprocess.run(
            ["amass", "enum", "-d", domain],
            capture_output=True, text=True
        )
        return result.stdout.splitlines()
    except Exception as e:
        return {"error": str(e)}
        
        
def run_secretfinder(domain):
    secretfinder_path = os.path.expanduser("~/SecretFinder")  # Resolves to ~/SecretFinder in the home directory
    if not os.path.exists(secretfinder_path):
        return {"error": f"SecretFinder directory not found at {secretfinder_path}"}
    
    try:
        print("[*] Running SecretFinder...")
        # Run SecretFinder to capture the output directly
        result = subprocess.run([
            "python3", "SecretFinder.py", "-o", "cli", "-e", "-i", f"https://{domain}"
        ], cwd=secretfinder_path, capture_output=True, text=True, check=True)
        
        cleaned = clean_output(result.stdout)
        return cleaned.splitlines()
    
    except Exception as e:
        return {"error": str(e)}

def run_nuclei(domain):
    try:
        print("[*] Running Nuclei with all templates...")
        # Expand the tilde in the template path to the user's home directory
        nuclei_templates_path = os.path.expanduser("~/nuclei-templates/")
        
        # Run the Nuclei scan and capture both stdout and stderr
        result = subprocess.run([
            "nuclei", "-u", f"https://{domain}", "-t", nuclei_templates_path
        ], capture_output=True, text=True)
        
        # If there is output in stdout, split it into lines and return
        if result.stdout:
            cleaned = clean_output(result.stdout)
            return cleaned.splitlines()
        # If there's error output, capture it and return as an error
        elif result.stderr:
            return {"error": result.stderr.strip().splitlines()}
        else:
            return {"error": "No output from Nuclei scan."}
    
    except Exception as e:
        return {"error": str(e)}


def run_dnsrecon(domain):
    try:
        print("[*] Running DNSRecon...")
        
        # Create a temp file path in the home directory
        temp_dir = os.path.expanduser("~")
        temp_file = os.path.join(temp_dir, f"dnsrecon_temp_{os.urandom(8).hex()}.json")
        
        # Run DNSRecon
        result = subprocess.run([
            "dnsrecon", "-d", domain, "-a", "-j", temp_file
        ], capture_output=True, text=True)
        
        if os.path.exists(temp_file):
            with open(temp_file, "r") as f:
                raw_data = json.load(f)

            # Filter out DNSRecon's internal ScanInfo (optional)
            data = [entry for entry in raw_data if entry.get("type") != "ScanInfo"]
            os.remove(temp_file)  # Clean up

            return {
                "ScanInfo": [
                    {
                        "arguments": f"dnsrecon -d {domain} -a -j {temp_file}",
                        "date": str(datetime.now()),
                        "type": "ScanInfo"
                    }
                ],
                "results": data
            }
        else:
            return {"error": "DNSRecon did not produce an output file."}

    except Exception as e:
        return {"error": str(e)}
        
def analyze_with_gemini(recon_data, api_key):
    try:
        genai.configure(api_key=api_key)
        model = genai.GenerativeModel("gemini-2.0-flash")

        prompt = f"""
        Analyze the following reconnaissance JSON report for the domain `{recon_data.get("domain")}`:

        {json.dumps(recon_data, indent=2)}

        Please provide:
        1. Key positive findings (what looks secure or well-configured)
        2. Key concerns or risks identified
        3. Specific actionable next steps to improve security posture

        Be concise and structured.
        """

        response = model.generate_content(prompt)
        return response.text

    except Exception as e:
        return f"AI Analysis Error: {str(e)}"

# Run all recon tools
print("\n[+] Starting Recon on:", DOMAIN)
recon_results = {
    "domain": DOMAIN,
    "whois": get_whois_info(DOMAIN),
    "port_scan": run_nmap_scan(DOMAIN),
    "sslscan": run_sslscan(DOMAIN),
    "web_fingerprint": run_web_fingerprint(DOMAIN),
    "directory_bruteforce": run_dirsearch(DOMAIN),
    "amass_subdomains": run_amass(DOMAIN),
    "secretfinder": run_secretfinder(DOMAIN),
    "nuclei_findings": run_nuclei(DOMAIN),
    "dnsrecon": run_dnsrecon(DOMAIN),
}

GEMINI_API_KEY = "YOUR_API_KEY"
print("[*] Running AI analysis via Gemini...")
ai_analysis = analyze_with_gemini(recon_results, GEMINI_API_KEY)
recon_results["ai_analysis"] = ai_analysis.splitlines()

# Save results to JSON file
output_file = "recon_results.json"
with open(output_file, "w") as json_file:
    json.dump(recon_results, json_file, indent=4)

print(f"\n[✔] Recon results saved to {output_file}")

Conclusion

By integrating Gemini AI into your recon workflow, you turn a noisy data dump into actionable intelligence. This isn’t just automation—it’s augmentation. Whether you’re red teaming or building internal security tools, AI is now your best recon buddy.

If you want to learn more about how to create recon script, please read following blog from our website : Automated Reconnaissance Testing

Picture of Deepansh Gupta

Deepansh Gupta

Deepansh is a Quality Analyst with 3+ years of experience in both manual and automation testing. He has worked on various tech stacks which include technologies such as Selenium, RestAssured, Gatling among others.

Leave a Comment

Suggested Article

Discover more from NashTech Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading