Automated Reconnaissance For Penetration Testing

Deepansh Gupta

Introduction

In penetration testing, reconnaissance is the first and most crucial phase. It involves gathering as much information as possible about the target to identify potential vulnerabilities. Automating this process helps save time and ensures that no crucial information is missed. In this blog, we will explore a Python script that helps with automated reconnaissance using popular security tools like WHOIS lookup, Nmap, SSLScan, WhatWeb, and Dirsearch.

This script collects essential data about a target domain, including domain registration details, open ports, SSL certificate information, web fingerprinting data, and directory enumeration results. The results are then saved in a structured JSON file, making it easy to analyze and utilize for further penetration testing activities.

Tools Used and Their Purpose

1. WHOIS Lookup

Why? WHOIS provides critical information about a domain’s registration details, including the registrar, creation date, expiration date, and name servers. This information helps in identifying domain ownership and possible attack surfaces.

How it helps in penetration testing?

Identifying domain ownership can help assess if the target is part of a larger infrastructure.
Expired or soon-to-expire domains could be at risk of takeover.

2. Nmap (Network Mapper)

Why? Nmap is one of the most widely used network scanning tools. It helps discover open ports and running services on a target domain.

How it helps in penetration testing?

Identifies open ports that could be exploited (e.g., SSH, FTP, or database ports).
Provides insights into the services running on those ports and their versions.

3. SSLScan

Why? SSLScan checks for SSL/TLS configurations and vulnerabilities related to encryption protocols.

How it helps in penetration testing?

Identifies weak or deprecated SSL/TLS versions.
Detects SSL certificate expiration, weak ciphers, and potential misconfigurations that could be exploited.

4. WhatWeb (Web Fingerprinting)

Why? WhatWeb identifies technologies used by the target website, such as CMS (WordPress, Joomla), frameworks, server details, and plugins.

How it helps in penetration testing?

Knowing the CMS and frameworks helps attackers find publicly known vulnerabilities (CVE exploits).
Server details may reveal outdated software or misconfigurations.

5. Dirsearch (Directory Enumeration)

Why? Dirsearch brute-forces directories and files on a web server to identify hidden or unlisted resources.

How it helps in penetration testing?

Can reveal sensitive files such as admin panels, config files, or backup files.
Helps discover endpoints that were not intended to be public.

How The Script Works

Prerequisites

Before running the script, ensure you have the required tools installed:

sudo apt install nmap sslscan whatweb
pip install python-whois

//additionally, also install dirsearch
git clone https://github.com/maurosoria/dirsearch.git
cd dirsearch
pip install -r requirements.txt

The Python Script

import subprocess
import json
import re
import whois

# Ask user for the target domain
DOMAIN = input("Enter the target domain: ").strip()

# Function to perform WHOIS lookup
def get_whois_info(domain):
    try:
        w = whois.whois(domain)
        return {
            "domain_name": w.domain_name,
            "registrar": w.registrar,
            "creation_date": str(w.creation_date),
            "expiration_date": str(w.expiration_date),
            "name_servers": w.name_servers
        }
    except Exception as e:
        return {"error": str(e)}

# Function to perform Port Scanning using Nmap
def run_nmap_scan(domain):
    try:
        print("[*] Running Nmap scan...")
        result = subprocess.run(
            ["nmap", "--top-ports", "1000", domain],
            capture_output=True,
            text=True
        )
        return result.stdout
    except Exception as e:
        return {"error": str(e)}

# Function to perform SSL Scan (Raw Output)
def run_sslscan(domain):
    try:
        print("[*] Running SSL scan...")
        result = subprocess.run(
            ["sslscan", domain],
            capture_output=True,
            text=True
        )
        return result.stdout  # Save raw output as-is
    except Exception as e:
        return {"error": str(e)}

# Function to perform Web Fingerprinting
def run_web_fingerprint(domain):
    try:
        print("[*] Running Web Fingerprinting...")
        result = subprocess.run(
            ["whatweb", domain],
            capture_output=True,
            text=True
        )
        return clean_ansi_codes(result.stdout)
    except Exception as e:
        return {"error": str(e)}

# Function to perform Directory Bruteforce (Only Findings)
def run_dirsearch(domain):
    try:
        print("[*] Running Dirsearch...")
        result = subprocess.run(
            ["dirsearch", "-u", f"https://{domain}"],
            capture_output=True,
            text=True
        )
        return extract_dirsearch_results(result.stdout)
    except Exception as e:
        return {"error": str(e)}

# Extract only found directories/files from Dirsearch output
def extract_dirsearch_results(output):
    findings = []
    for line in output.split("\n"):
        if re.search(r"\[\d{3}\]\s+=>", line):  # Match response codes like [200] => /admin
            findings.append(line.strip())
    return findings if findings else ["No directories/files found."]

# Function to clean ANSI escape codes (e.g., color codes)
def clean_ansi_codes(text):
    ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
    return ansi_escape.sub('', text)

# Run all recon tools
print("\n[+] Starting Recon on:", DOMAIN)
recon_results = {
    "domain": DOMAIN,
    "whois": get_whois_info(DOMAIN),
    "port_scan": run_nmap_scan(DOMAIN),
    "sslscan": run_sslscan(DOMAIN),  # Raw output
    "web_fingerprint": run_web_fingerprint(DOMAIN),
    "directory_bruteforce": run_dirsearch(DOMAIN)  # Cleaned findings only
}

# Save results to JSON file
output_file = "recon_results.json"
with open(output_file, "w") as json_file:
    json.dump(recon_results, json_file, indent=4)

print(f"\n[✔] Recon results saved to {output_file}")

Steps

User inputs a target domain.
The script runs:
- WHOIS lookup
- Nmap scan for top 1000 ports
- SSLScan for SSL/TLS vulnerabilities
- WhatWeb for web fingerprinting
- Dirsearch for directory enumeration (only meaningful results are saved)
Results are collected and stored in a JSON file for further analysis.

Running The Automated Reconnaissance Script

Run the script and enter the target domain when prompted:

python recon_script.py

Expected Output

Once executed, the script generates a JSON file (recon_results.json) with structured output, including:

{
    "domain": "example.com",
    "whois": {
        "domain_name": "example.com",
        "registrar": "GoDaddy",
        "creation_date": "2003-04-01",
        "expiration_date": "2026-04-01",
        "name_servers": ["ns1.example.com", "ns2.example.com"]
    },
    "port_scan": "Open ports: 22, 80, 443",
    "sslscan": "(Raw SSL scan output)",
    "web_fingerprint": "WordPress 5.8, Apache 2.4",
    "directory_bruteforce": ["/admin - 200 OK", "/backup - 403 Forbidden"]
}

How This Helps in Penetration Testing

Identifies outdated software or weak SSL configurations.
Finds open ports and services that could be exploited.
Detects exposed directories that may contain sensitive information.
Helps with further targeted exploitation (e.g., admin panels, login pages, API endpoints).

Conclusion

This script helps with automated reconnaissance phase in penetration testing. It gathers critical information about a target and presents it in a structured format for further analysis. By leveraging tools like WHOIS, Nmap, SSLScan, WhatWeb, and Dirsearch, security professionals can efficiently identify potential weaknesses in a target’s infrastructure.

Also read: Secure docker containers with security testing

Deepansh Gupta

Deepansh is a Quality Analyst with 3+ years of experience in both manual and automation testing. He has worked on various tech stacks which include technologies such as Selenium, RestAssured, Gatling among others.

Leave a Comment Cancel Reply

Suggested Article