← Certified Web Application Pentester
Task 1
Passive Reconnaissance Fundamentals
1. What is Passive Reconnaissance
Passive reconnaissance gathers information about a target without directly interacting with it. No packets are sent to the target's infrastructure, making it undetectable.
Active Recon: Attacker --> [packets] --> Target (detectable) Passive Recon: Attacker --> [queries] --> Third-party sources (undetectable by target)
2. WHOIS Enumeration
# Domain WHOIS whois target.com # Key information to extract: # - Registrant name, email, organization # - Name servers (hosting provider clues) # - Registration/expiration dates # - Registrar information # IP WHOIS whois 93.184.216.34 # Key information: # - IP range/CIDR block owned # - Organization name # - Abuse contact # - Network name (NetName) # Reverse WHOIS (find domains by registrant) # amass intel -whois -d target.com # reversewhois.io # whoxy.com API
3. DNS Reconnaissance
# All DNS records dig target.com ANY # Specific record types dig target.com A # IPv4 address dig target.com AAAA # IPv6 address dig target.com MX # Mail servers dig target.com NS # Name servers dig target.com TXT # TXT records (SPF, DKIM, verification tokens) dig target.com CNAME # Canonical names dig target.com SOA # Start of Authority dig target.com SRV # Service records # Using specific DNS server dig @8.8.8.8 target.com A dig @1.1.1.1 target.com A # Reverse DNS dig -x 93.184.216.34 # Zone transfer attempt dig axfr @ns1.target.com target.com # DNS trace (follow delegation chain) dig +trace target.com # Short output dig +short target.com A dig +short target.com MX
DNS Record Security Implications
| Record Type | Security Relevance |
|---|---|
| A/AAAA | Direct IP, hosting provider identification |
| MX | Email infrastructure, phishing targets |
| NS | DNS provider, potential takeover |
| TXT | SPF/DKIM/DMARC config, domain verification tokens |
| CNAME | Subdomain takeover candidates |
| SRV | Internal services exposed |
| SOA | Zone admin email, serial number |
4. Certificate Transparency Logs
# crt.sh - query CT logs curl -s "https://crt.sh/?q=%25.target.com&output=json" | jq -r '.[].name_value' | sort -u # Filter for unique subdomains curl -s "https://crt.sh/?q=%25.target.com&output=json" | \ jq -r '.[].name_value' | \ sed 's/\*\.//g' | \ sort -u > ct_subdomains.txt # censys.io certificates API # https://search.censys.io/certificates?q=target.com # Google Certificate Transparency # https://transparencyreport.google.com/https/certificates # Facebook CT monitoring # https://developers.facebook.com/tools/ct/
5. Search Engine Dorking
Google Dorks
# Find login pages site:target.com inurl:login site:target.com inurl:admin site:target.com intitle:"login" OR intitle:"sign in" # Find sensitive files site:target.com filetype:pdf site:target.com filetype:xlsx OR filetype:csv site:target.com filetype:sql site:target.com filetype:env site:target.com filetype:log site:target.com filetype:bak site:target.com filetype:conf OR filetype:cfg site:target.com filetype:xml # Find exposed directories site:target.com intitle:"index of" site:target.com intitle:"directory listing" # Find error messages site:target.com "php error" OR "sql syntax" OR "undefined index" site:target.com "stack trace" OR "traceback" # Find API documentation site:target.com inurl:api site:target.com inurl:swagger OR inurl:api-docs site:target.com filetype:json inurl:openapi # Find sensitive information site:target.com "password" filetype:txt site:target.com "api_key" OR "apikey" OR "api-key" site:target.com "BEGIN RSA PRIVATE KEY" # Find subdomains site:*.target.com -www # Find WordPress specific site:target.com inurl:wp-content site:target.com inurl:wp-admin site:target.com filetype:xml inurl:sitemap # Cached/old versions cache:target.com
Other Search Engines
# Bing # site:target.com # DuckDuckGo # site:target.com # Yandex (good for .ru domains) # site:target.com # Shodan # hostname:target.com # org:"Target Organization" # ssl.cert.subject.cn:target.com # Censys # services.tls.certificates.leaf.names:target.com # ZoomEye # site:target.com # FOFA # domain="target.com"
6. Web Archive Analysis
# Wayback Machine URLs waybackurls target.com > wayback_urls.txt # gau (GetAllUrls) - multiple sources gau target.com > gau_urls.txt gau --subs target.com > gau_with_subs.txt # Combine and deduplicate cat wayback_urls.txt gau_urls.txt | sort -u > all_historical_urls.txt # Filter for interesting endpoints cat all_historical_urls.txt | grep -iE "\.(php|asp|aspx|jsp|json|xml|config|env|sql|bak|old|backup)" > interesting_urls.txt # Filter for parameters cat all_historical_urls.txt | grep "?" | sort -u > parameterized_urls.txt # Filter for API endpoints cat all_historical_urls.txt | grep -iE "(api|graphql|rest|v1|v2|v3)" > api_urls.txt # Check which URLs are still alive cat interesting_urls.txt | httpx -silent -status-code -title > alive_urls.txt # Wayback Machine snapshots via API curl -s "https://web.archive.org/cdx/search/cdx?url=target.com/*&output=json&fl=timestamp,original,statuscode,mimetype" | jq .
7. Technology Fingerprinting
# Wappalyzer (browser extension or CLI) # Identifies: CMS, frameworks, programming languages, servers, analytics # BuiltWith # https://builtwith.com/target.com # Netcraft # https://sitereport.netcraft.com/?url=target.com # WhatRuns (browser extension) # Check HTTP headers for technology hints curl -sI https://target.com | grep -iE "(server|x-powered|x-aspnet|x-generator|x-drupal|x-framework)" # robots.txt analysis curl -s https://target.com/robots.txt # Common technology indicators in HTML curl -s https://target.com | grep -ioE "(wp-content|drupal|joomla|laravel|django|rails|angular|react|vue|next|nuxt)" # favicon hash for identification curl -s https://target.com/favicon.ico | md5sum # Compare hash with Shodan favicon database # http.favicon.hash:<hash_value>
8. Email Harvesting
# theHarvester theHarvester -d target.com -b google,bing,linkedin,twitter -l 500 # hunter.io # https://hunter.io/domain-search (API available) # Phonebook.cz # https://phonebook.cz # Clearbit Connect # Browser extension for email discovery # Verify emails # emailhippo.com # verify-email.org # LinkedIn OSINT # Search for employees: site:linkedin.com "target company" # Extract names → generate email patterns # Common email patterns to try # [email protected] # [email protected] # [email protected] # [email protected] # [email protected]
9. Social Media OSINT
# GitHub/GitLab reconnaissance # Search for organization repos # https://github.com/target-org # Search code for secrets # GitHub: org:target-org password # GitHub: org:target-org api_key # GitHub: org:target-org secret # GitHub: org:target-org AWS_ACCESS_KEY # GitDorker - automated GitHub dorking # python3 GitDorker.py -t <github_token> -org target-org # truffleHog - find secrets in git history trufflehog github --org=target-org # LinkedIn # Company page → employees list # Technology stack from job postings # Twitter/X # @target_company tweets for technology mentions # Employees discussing internal tools # Pastebin/paste sites # Search for target.com on paste sites # Dehashed, IntelligenceX for leaked data
10. Shodan and Internet-Wide Scan Data
# Shodan CLI shodan search "hostname:target.com" shodan search "org:\"Target Organization\"" shodan search "ssl.cert.subject.cn:target.com" shodan host 93.184.216.34 # Shodan filters # port: Specific port # country: Country code # city: City name # os: Operating system # product: Software name # version: Software version # vuln: CVE number # Censys # https://search.censys.io # services.tls.certificates.leaf.names:target.com # BinaryEdge # https://app.binaryedge.io # GreyNoise # https://viz.greynoise.io # Shodan Dorks for specific services # "Apache" hostname:target.com # "nginx" hostname:target.com # "Microsoft-IIS" hostname:target.com # "X-Powered-By: PHP" hostname:target.com # "Set-Cookie: JSESSIONID" hostname:target.com
11. Metadata Analysis
# Extract metadata from documents exiftool document.pdf exiftool -a -u -g1 document.pdf # FOCA (Windows tool for metadata extraction) # Batch download and analyze documents # Download all PDFs from target wget -r -l1 -A pdf https://target.com/ # Extract metadata from all find . -name "*.pdf" -exec exiftool {} \; > metadata_results.txt # Key metadata to look for: # - Author names (internal usernames) # - Software versions # - Internal paths (C:\Users\john\...) # - Printer names # - GPS coordinates # - Email addresses # - Creation/modification dates
12. Passive Recon Automation Script
#!/bin/bash # passive_recon.sh - Automated passive reconnaissance TARGET=$1 echo "[*] Starting passive recon for: $TARGET" mkdir -p recon/$TARGET # WHOIS echo "[*] WHOIS lookup..." whois $TARGET > recon/$TARGET/whois.txt # DNS records echo "[*] DNS enumeration..." for type in A AAAA MX NS TXT SOA CNAME; do dig +short $TARGET $type >> recon/$TARGET/dns_records.txt done # Certificate Transparency echo "[*] Certificate transparency..." curl -s "https://crt.sh/?q=%25.$TARGET&output=json" | \ jq -r '.[].name_value' 2>/dev/null | \ sed 's/\*\.//g' | sort -u > recon/$TARGET/ct_subdomains.txt # Wayback URLs echo "[*] Wayback Machine URLs..." waybackurls $TARGET 2>/dev/null | sort -u > recon/$TARGET/wayback_urls.txt # gau echo "[*] GAU URLs..." gau $TARGET 2>/dev/null | sort -u > recon/$TARGET/gau_urls.txt # Combine URLs cat recon/$TARGET/wayback_urls.txt recon/$TARGET/gau_urls.txt | \ sort -u > recon/$TARGET/all_urls.txt echo "[*] Results saved to recon/$TARGET/" echo "[*] Subdomains found: $(wc -l < recon/$TARGET/ct_subdomains.txt)" echo "[*] URLs collected: $(wc -l < recon/$TARGET/all_urls.txt)"
Task 2
DNS Enumeration and Zone Transfers
1. DNS Architecture for Pentesters
Client Query → Recursive Resolver → Root NS → TLD NS → Authoritative NS ↓ DNS Response
DNS Record Types Deep Dive
# A Record - IPv4 address mapping dig target.com A +short # 93.184.216.34 # AAAA Record - IPv6 address mapping dig target.com AAAA +short # 2606:2800:220:1:248:1893:25c8:1946 # MX Record - Mail servers (priority:server) dig target.com MX +short # 10 mail1.target.com. # 20 mail2.target.com. # NS Record - Authoritative name servers dig target.com NS +short # ns1.target.com. # ns2.target.com. # SOA Record - Zone authority dig target.com SOA +short # ns1.target.com. admin.target.com. 2024010101 3600 900 604800 86400 # TXT Record - Text records (SPF, DKIM, verification) dig target.com TXT +short # "v=spf1 include:_spf.google.com ~all" # "google-site-verification=..." # CNAME Record - Alias/canonical name dig www.target.com CNAME +short # target.com. # SRV Record - Service location dig _sip._tcp.target.com SRV +short # 10 60 5060 sip.target.com. # PTR Record - Reverse DNS dig -x 93.184.216.34 +short # target.com. # CAA Record - Certificate Authority Authorization dig target.com CAA +short # 0 issue "letsencrypt.org"
2. Zone Transfer Attacks (AXFR)
# Identify name servers first dig target.com NS +short # Attempt zone transfer from each NS dig axfr target.com @ns1.target.com dig axfr target.com @ns2.target.com # Using host command host -t axfr target.com ns1.target.com # Using nmap nmap --script dns-zone-transfer -p 53 ns1.target.com # IXFR (Incremental Zone Transfer) dig ixfr target.com @ns1.target.com # What a successful zone transfer reveals: # - ALL subdomains and their IP addresses # - Internal hostnames # - Mail servers # - Service records # - Network architecture
Zone Transfer Automation
#!/bin/bash # zone_transfer.sh TARGET=$1 echo "[*] Attempting zone transfers for $TARGET" # Get name servers NS_SERVERS=$(dig +short NS $TARGET) for ns in $NS_SERVERS; do echo "[*] Trying zone transfer from: $ns" result=$(dig axfr $TARGET @$ns 2>&1) if echo "$result" | grep -qv "Transfer failed"; then echo "[+] Zone transfer successful from $ns!" echo "$result" > "${TARGET}_zone_transfer_${ns}.txt" else echo "[-] Zone transfer failed from $ns" fi done
3. Subdomain Enumeration
Passive Subdomain Discovery
# Subfinder - fast passive subdomain enumeration subfinder -d target.com -o subfinder_subs.txt subfinder -d target.com -all -o subfinder_all.txt # Amass - comprehensive OSINT amass enum -passive -d target.com -o amass_passive.txt amass enum -d target.com -o amass_active.txt # includes active # Assetfinder assetfinder target.com > assetfinder_subs.txt assetfinder --subs-only target.com > assetfinder_subs_only.txt # Findomain findomain -t target.com -q > findomain_subs.txt # Combine all results cat subfinder_subs.txt amass_passive.txt assetfinder_subs.txt findomain_subs.txt | \ sort -u > all_subdomains.txt
Active Subdomain Bruteforcing
# DNS bruteforce with common wordlist gobuster dns -d target.com -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-5000.txt -t 50 # Shuffledns - massdns wrapper shuffledns -d target.com -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt -r resolvers.txt # Puredns - fast DNS bruteforcing puredns bruteforce /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt target.com -r resolvers.txt # DNSx - DNS toolkit cat all_subdomains.txt | dnsx -silent -a -resp > resolved_subs.txt # Altdns - subdomain alteration altdns -i all_subdomains.txt -o altered_subs.txt -w words.txt cat altered_subs.txt | dnsx -silent > resolved_altered.txt # dnsgen - generate permutations cat all_subdomains.txt | dnsgen - | dnsx -silent > dnsgen_results.txt # Common subdomain patterns to check # dev, staging, test, uat, qa, preprod, beta # api, api-v2, api-internal # admin, portal, dashboard, panel # mail, webmail, smtp, pop, imap # vpn, remote, gateway # git, gitlab, bitbucket, jenkins, ci, cd # jira, confluence, wiki, docs # db, database, mysql, postgres, mongo, redis # backup, old, legacy, archive # cdn, static, assets, media, images
4. DNS Security Analysis
SPF Record Analysis
# Check SPF dig target.com TXT | grep "v=spf1" # SPF Mechanisms: # ip4:x.x.x.x - Allow specific IPv4 # ip6:... - Allow specific IPv6 # include:domain - Include another domain's SPF # a - Allow domain's A record IPs # mx - Allow domain's MX IPs # all - Match all (with qualifier) # Qualifiers: # +all PASS (allow all - very weak) # -all FAIL (reject non-listed - strong) # ~all SOFTFAIL (mark but accept - moderate) # ?all NEUTRAL (no policy - weak) # Weak SPF examples: # v=spf1 +all → allows anyone to send # v=spf1 ?all → neutral policy # (no SPF record) → no email auth
DMARC Record Analysis
# Check DMARC dig _dmarc.target.com TXT # DMARC tags: # v=DMARC1 - Version # p=none - No action (monitoring only) # p=quarantine - Quarantine failures # p=reject - Reject failures # rua= - Aggregate report URI # ruf= - Forensic report URI # pct= - Percentage of messages to filter # sp= - Subdomain policy # Weak DMARC: # p=none (monitoring only, no enforcement) # No DMARC record at all
DKIM Record Analysis
# DKIM selector discovery dig google._domainkey.target.com TXT dig default._domainkey.target.com TXT dig selector1._domainkey.target.com TXT dig k1._domainkey.target.com TXT # Common selectors to check: # google, default, selector1, selector2, s1, s2, k1, dkim, mail
5. DNS Cache Snooping
# Non-recursive query to check if domain is cached dig @ns1.target.com example.com A +norecurse # If response received → domain was recently resolved # Can reveal what sites employees visit # Automated cache snooping nmap --script dns-cache-snoop --script-args 'dns-cache-snoop.domains={facebook.com,gmail.com,slack.com}' -p 53 ns1.target.com
6. DNS Rebinding Detection
# DNS Rebinding Attack Flow: 1. Victim visits attacker.com 2. attacker.com resolves to attacker's IP (first resolution) 3. JavaScript loads from attacker's server 4. DNS TTL expires 5. attacker.com resolves to internal IP (127.0.0.1 or 192.168.x.x) 6. JavaScript can now access internal services via same-origin # Detection: check for very low TTL values dig target.com A | grep -i "ttl" # TTL < 60 seconds may indicate DNS rebinding potential
7. Subdomain Takeover via DNS
# Check for dangling CNAME records cat all_subdomains.txt | while read sub; do cname=$(dig +short CNAME $sub) if [ -n "$cname" ]; then echo "$sub -> $cname" fi done > cname_records.txt # Check if CNAME targets are claimable # Common takeover targets: # *.s3.amazonaws.com (404 NoSuchBucket) # *.herokuapp.com (No such app) # *.ghost.io (404) # *.github.io (404) # *.azurewebsites.net (404) # *.cloudfront.net (Bad Request) # *.pantheon.io # *.shopify.com # *.tumblr.com # *.wordpress.com # *.zendesk.com # Automated check subjack -w all_subdomains.txt -t 100 -timeout 30 -ssl -c fingerprints.json -v # Nuclei subdomain takeover templates nuclei -l all_subdomains.txt -t takeovers/
8. DNS Tunneling Detection
# Signs of DNS tunneling: # - Unusually long subdomain labels # - High volume of TXT record queries # - Base64/hex encoded subdomain labels # - Queries to suspicious domains with many subdomains # Monitor DNS traffic tcpdump -i eth0 port 53 -w dns_capture.pcap # Analyze with tshark tshark -r dns_capture.pcap -Y "dns" -T fields -e dns.qry.name | \ awk '{print length, $0}' | sort -rn | head -20 # DNS tunneling tools (for authorized testing): # iodine, dnscat2, dns2tcp
9. Reverse DNS Enumeration
# Reverse DNS on IP range # If target owns 93.184.216.0/24 for ip in $(seq 1 254); do result=$(dig -x 93.184.216.$ip +short 2>/dev/null) if [ -n "$result" ]; then echo "93.184.216.$ip -> $result" fi done # Using dnsrecon dnsrecon -r 93.184.216.0/24 -n 8.8.8.8 # Using nmap nmap -sL 93.184.216.0/24 | grep "(" > reverse_dns.txt # Fierce - DNS reconnaissance tool fierce --domain target.com
10. Comprehensive DNS Recon Script
#!/bin/bash # dns_recon.sh - Complete DNS reconnaissance TARGET=$1 OUTDIR="recon/${TARGET}/dns" mkdir -p $OUTDIR echo "=== DNS Reconnaissance: $TARGET ===" # Standard records echo "[*] Querying standard DNS records..." for type in A AAAA MX NS TXT SOA CNAME CAA SRV; do echo "--- $type Records ---" >> $OUTDIR/records.txt dig $TARGET $type +noall +answer >> $OUTDIR/records.txt echo "" >> $OUTDIR/records.txt done # Name servers echo "[*] Identifying name servers..." dig +short NS $TARGET > $OUTDIR/nameservers.txt # Zone transfer echo "[*] Attempting zone transfers..." while read ns; do echo "[*] Trying AXFR from $ns..." dig axfr $TARGET @$ns >> $OUTDIR/zone_transfer.txt 2>&1 done < $OUTDIR/nameservers.txt # SPF/DMARC/DKIM echo "[*] Checking email security records..." echo "=== SPF ===" > $OUTDIR/email_security.txt dig $TARGET TXT | grep "v=spf1" >> $OUTDIR/email_security.txt echo "=== DMARC ===" >> $OUTDIR/email_security.txt dig _dmarc.$TARGET TXT >> $OUTDIR/email_security.txt echo "=== DKIM (common selectors) ===" >> $OUTDIR/email_security.txt for sel in google default selector1 selector2 k1 dkim mail; do dig ${sel}._domainkey.$TARGET TXT +short >> $OUTDIR/email_security.txt 2>/dev/null done echo "[*] DNS recon complete. Results in $OUTDIR/"
Task 3
Subdomain Discovery Techniques
1. Why Subdomain Discovery Matters
Main domain: target.com (hardened, WAF, monitored) | +-- dev.target.com (debug mode, weak auth) +-- staging.target.com (outdated code, test data) +-- api-internal.target.com (no authentication) +-- old.target.com (vulnerable legacy app) +-- jenkins.target.com (default credentials) +-- backup.target.com (exposed database dumps)
Subdomains often have weaker security than the main domain, making them high-value targets.
2. Passive Subdomain Enumeration
2.1 Certificate Transparency
# crt.sh curl -s "https://crt.sh/?q=%25.target.com&output=json" | \ jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u # certspotter curl -s "https://api.certspotter.com/v1/issuances?domain=target.com&include_subdomains=true&expand=dns_names" | \ jq -r '.[].dns_names[]' | sort -u # Censys certificates # censys search "parsed.names: target.com" --index-type certificates
2.2 Passive DNS Databases
# SecurityTrails API curl -s --header "APIKEY: $SECURITYTRAILS_KEY" \ "https://api.securitytrails.com/v1/domain/target.com/subdomains" | jq '.subdomains[]' # VirusTotal curl -s "https://www.virustotal.com/vtapi/v2/domain/report?apikey=$VT_KEY&domain=target.com" | \ jq '.subdomains[]' # AlienVault OTX curl -s "https://otx.alienvault.com/api/v1/indicators/domain/target.com/passive_dns" | \ jq '.passive_dns[].hostname' | sort -u # RapidDNS curl -s "https://rapiddns.io/subdomain/target.com?full=1" | \ grep -oP '_blank">\K[^<]*' | sort -u # Hackertarget curl -s "https://api.hackertarget.com/hostsearch/?q=target.com" | cut -d, -f1
2.3 Multi-Source Tools
# Subfinder (30+ passive sources) subfinder -d target.com -all -o subfinder.txt subfinder -d target.com -all -cs -o subfinder_sources.txt # show sources subfinder -dL domains.txt -o subfinder_multi.txt # multiple domains # Amass passive amass enum -passive -d target.com -o amass_passive.txt amass enum -passive -d target.com -src -o amass_sources.txt # show sources # Assetfinder assetfinder --subs-only target.com > assetfinder.txt # Findomain findomain -t target.com -u findomain.txt # Chaos (ProjectDiscovery) chaos -d target.com -key $CHAOS_KEY -o chaos.txt # Combine all cat subfinder.txt amass_passive.txt assetfinder.txt findomain.txt chaos.txt | \ sort -u > all_passive_subs.txt echo "[*] Total unique subdomains: $(wc -l < all_passive_subs.txt)"
3. Active Subdomain Bruteforcing
3.1 DNS Bruteforcing
# Gobuster DNS gobuster dns -d target.com \ -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt \ -t 50 -o gobuster_dns.txt # Massdns (fastest) massdns -r resolvers.txt -t A -o S -w massdns_results.txt subdomains_wordlist.txt # Shuffledns (massdns wrapper with wildcard filtering) shuffledns -d target.com \ -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt \ -r resolvers.txt -o shuffledns.txt # Puredns (wildcard filtering + bruteforce) puredns bruteforce /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt \ target.com -r resolvers.txt -w puredns.txt # Create resolvers list # Public DNS resolvers for mass DNS resolution cat > resolvers.txt << 'EOF' 8.8.8.8 8.8.4.4 1.1.1.1 1.0.0.1 9.9.9.9 208.67.222.222 208.67.220.220 EOF # DNSx - resolve and filter cat all_subs.txt | dnsx -silent -a -cname -resp > dnsx_resolved.txt
3.2 Wordlists for DNS Bruteforcing
# Best wordlists: # /usr/share/seclists/Discovery/DNS/subdomains-top1million-5000.txt (quick) # /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt (medium) # /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt (thorough) # /usr/share/seclists/Discovery/DNS/dns-Jhaddix.txt (comprehensive) # /usr/share/seclists/Discovery/DNS/fierce-hostlist.txt # /usr/share/seclists/Discovery/DNS/namelist.txt # /usr/share/seclists/Discovery/DNS/deepmagic.com-prefixes-top50000.txt # Custom wordlist based on discovered subdomains # Extract patterns from known subdomains cat known_subs.txt | sed 's/\.target\.com$//' | tr '-' '\n' | tr '.' '\n' | sort -u > custom_words.txt
4. Subdomain Permutation and Alteration
# Altdns - subdomain alteration and permutation altdns -i known_subs.txt -o altered.txt -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-5000.txt cat altered.txt | puredns resolve -r resolvers.txt > resolved_altered.txt # dnsgen - intelligent permutation cat known_subs.txt | dnsgen - > dnsgen_perms.txt cat dnsgen_perms.txt | puredns resolve -r resolvers.txt > resolved_dnsgen.txt # gotator - generate permutations gotator -sub known_subs.txt -perm permutation_words.txt -depth 1 -numbers 3 -md > gotator.txt # Common permutation patterns # dev-api, api-dev, api-staging, staging-api # v2-api, api-v2, apiv2 # internal-api, api-internal # test-app, app-test, app1, app2 # us-east-1, eu-west-1 (regional)
5. Virtual Host Discovery
# Virtual hosts share the same IP but serve different content based on Host header # ffuf vhost discovery ffuf -u http://TARGET_IP -H "Host: FUZZ.target.com" \ -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-5000.txt \ -fs 0 -mc all # gobuster vhost gobuster vhost -u http://TARGET_IP -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-5000.txt \ --append-domain -t 50 # Manual vhost testing curl -s -H "Host: dev.target.com" http://TARGET_IP curl -s -H "Host: staging.target.com" http://TARGET_IP curl -s -H "Host: admin.target.com" http://TARGET_IP # Filter by response size (exclude default/wildcard responses) # First, get default response size: DEFAULT_SIZE=$(curl -s -H "Host: nonexistent12345.target.com" http://TARGET_IP | wc -c) echo "Default response size: $DEFAULT_SIZE" # Then filter ffuf -u http://TARGET_IP -H "Host: FUZZ.target.com" \ -w wordlist.txt -fs $DEFAULT_SIZE
6. Wildcard DNS Detection and Handling
# Test for wildcard DNS dig randomnonexistent12345.target.com A +short # If returns an IP → wildcard DNS is configured # Wildcard detection script RANDOM_SUB=$(cat /dev/urandom | tr -dc 'a-z' | fold -w 20 | head -n 1) WILDCARD_IP=$(dig +short $RANDOM_SUB.target.com A) if [ -n "$WILDCARD_IP" ]; then echo "[!] Wildcard DNS detected: *.$1 -> $WILDCARD_IP" echo "[*] Filter results by this IP" fi # Tools that handle wildcards automatically: # puredns (built-in wildcard detection) # shuffledns (built-in wildcard detection) # massdns + wildcard filtering post-processing
7. Subdomain Validation and Probing
# Resolve all subdomains cat all_subs.txt | dnsx -silent -a -resp -o resolved.txt # HTTP probing - find live web services cat all_subs.txt | httpx -silent -status-code -title -tech-detect -web-server \ -content-length -follow-redirects -o httpx_results.txt # Filter interesting results cat httpx_results.txt | grep -E "200|301|302|401|403" > live_web.txt cat httpx_results.txt | grep "401\|403" > auth_required.txt cat httpx_results.txt | grep -i "admin\|panel\|dashboard\|portal" > admin_panels.txt # Screenshot all live hosts gowitness file -f live_subs.txt -P screenshots/ # or aquatone < live_subs.txt # Nuclei scan on all subdomains nuclei -l live_subs.txt -t technologies/ -o tech_detection.txt
8. Subdomain Takeover
# What makes a subdomain vulnerable to takeover: # 1. CNAME points to external service # 2. External service account is deleted/unconfigured # 3. Attacker can claim the service and serve content on the subdomain # Check for CNAME records cat all_subs.txt | dnsx -cname -resp -o cnames.txt # Automated takeover checking subjack -w all_subs.txt -t 100 -timeout 30 -ssl -c /path/to/fingerprints.json -v -o subjack_results.txt # Nuclei takeover templates nuclei -l all_subs.txt -t http/takeovers/ -o takeover_results.txt # can-i-take-over-xyz reference # https://github.com/EdOverflow/can-i-take-over-xyz # Manual verification for common services: # AWS S3: "NoSuchBucket" error # GitHub Pages: 404 with GitHub branding # Heroku: "No such app" # Azure: "NXDOMAIN" for *.azurewebsites.net # Shopify: "Sorry, this shop is currently unavailable" # Fastly: "Fastly error: unknown domain"
9. Scope Expansion Techniques
# Find related domains via reverse IP # All domains on same IP curl -s "https://api.hackertarget.com/reverseiplookup/?q=93.184.216.34" # Find domains in same IP range # whois the IP → find CIDR block → reverse DNS on range # Find related organizations via ASN # Lookup ASN whois -h whois.radb.net -- '-i origin AS12345' # or curl -s "https://api.bgpview.io/asn/12345/prefixes" # Google Analytics / AdSense ID tracking # If target uses UA-12345678 in Google Analytics # Search for other sites with same UA ID # builtwith.com → Relationship Profile # Find domains by same registrant # whoxy.com reverse WHOIS # Favicon hash matching on Shodan curl -s https://target.com/favicon.ico | python3 -c " import mmh3, sys, codecs favicon = codecs.encode(sys.stdin.buffer.read(), 'base64') print(f'http.favicon.hash:{mmh3.hash(favicon)}')"
10. Complete Subdomain Discovery Pipeline
#!/bin/bash # subdomain_discovery.sh - Complete pipeline TARGET=$1 OUTDIR="recon/${TARGET}/subdomains" RESOLVERS="resolvers.txt" mkdir -p $OUTDIR echo "=== Subdomain Discovery Pipeline: $TARGET ===" # Phase 1: Passive collection echo "[1/6] Passive enumeration..." subfinder -d $TARGET -all -silent > $OUTDIR/subfinder.txt 2>/dev/null amass enum -passive -d $TARGET -o $OUTDIR/amass.txt 2>/dev/null assetfinder --subs-only $TARGET > $OUTDIR/assetfinder.txt 2>/dev/null findomain -t $TARGET -q > $OUTDIR/findomain.txt 2>/dev/null # CT logs curl -s "https://crt.sh/?q=%25.$TARGET&output=json" | \ jq -r '.[].name_value' 2>/dev/null | sed 's/\*\.//g' | sort -u > $OUTDIR/crt.txt # Combine passive results cat $OUTDIR/*.txt | sort -u > $OUTDIR/passive_all.txt echo "[*] Passive subdomains: $(wc -l < $OUTDIR/passive_all.txt)" # Phase 2: Active bruteforcing echo "[2/6] DNS bruteforcing..." puredns bruteforce /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt \ $TARGET -r $RESOLVERS -w $OUTDIR/bruteforce.txt 2>/dev/null # Phase 3: Permutations echo "[3/6] Generating permutations..." cat $OUTDIR/passive_all.txt | dnsgen - 2>/dev/null | \ puredns resolve -r $RESOLVERS 2>/dev/null > $OUTDIR/permutations.txt # Phase 4: Combine all echo "[4/6] Combining results..." cat $OUTDIR/passive_all.txt $OUTDIR/bruteforce.txt $OUTDIR/permutations.txt | \ sort -u > $OUTDIR/all_subdomains.txt echo "[*] Total unique subdomains: $(wc -l < $OUTDIR/all_subdomains.txt)" # Phase 5: Resolve and validate echo "[5/6] Resolving subdomains..." cat $OUTDIR/all_subdomains.txt | dnsx -silent -a -resp > $OUTDIR/resolved.txt # Phase 6: HTTP probing echo "[6/6] HTTP probing..." cat $OUTDIR/all_subdomains.txt | httpx -silent -status-code -title -tech-detect \ -web-server -o $OUTDIR/httpx_results.txt echo "=== Discovery Complete ===" echo "[*] Total subdomains: $(wc -l < $OUTDIR/all_subdomains.txt)" echo "[*] Live web services: $(wc -l < $OUTDIR/httpx_results.txt)" echo "[*] Results saved to $OUTDIR/"
Task 4
Technology Fingerprinting
1. Why Technology Fingerprinting Matters
Identify Stack → Find Known CVEs → Map Attack Surface → Target Exploits Example: Server: Apache 2.4.49 → CVE-2021-41773 (Path Traversal/RCE) PHP: 8.1.0-dev → Backdoor RCE WordPress: 5.7.0 → Known plugin vulnerabilities jQuery: 1.6.1 → XSS via jQuery.html()
2. HTTP Header Analysis
# Get all response headers curl -sI https://target.com # Key headers for fingerprinting: # Server: Apache/2.4.51 (Ubuntu) # X-Powered-By: PHP/8.1.2 # X-AspNet-Version: 4.0.30319 # X-AspNetMvc-Version: 5.2 # X-Generator: Drupal 9 # X-Drupal-Cache: HIT # X-Drupal-Dynamic-Cache: MISS # X-WordPress: true # X-Pingback: xmlrpc.php (WordPress indicator) # X-Shopify-Stage: production # X-Wix-Request-Id: ... (Wix) # X-GitHub-Request-Id: ... (GitHub Pages) # Verbose curl output curl -v https://target.com 2>&1 | grep -iE "(< server|< x-powered|< x-asp|< x-gen|< set-cookie|< x-frame)" # Check multiple pages for path in / /login /admin /api /robots.txt; do echo "=== $path ===" curl -sI "https://target.com$path" | grep -iE "(server|x-powered|x-asp|x-gen)" done # Cookie-based fingerprinting # PHPSESSID → PHP # JSESSIONID → Java (Tomcat/JBoss/etc.) # ASP.NET_SessionId → ASP.NET # laravel_session → Laravel # _rails_session → Ruby on Rails # connect.sid → Node.js Express # csrftoken + sessionid → Django # ci_session → CodeIgniter
3. Automated Fingerprinting Tools
# WhatWeb - comprehensive web fingerprinting whatweb https://target.com whatweb -a 3 https://target.com # aggressive mode whatweb -v https://target.com # verbose whatweb --input-file=urls.txt -o results.json --log-json # batch # Wappalyzer CLI # npm install -g wappalyzer wappalyzer https://target.com # httpx with tech detection echo target.com | httpx -tech-detect -status-code -title -web-server -silent # Batch tech detection cat live_hosts.txt | httpx -tech-detect -status-code -title -web-server -silent -o tech_results.txt # webanalyze (Go-based Wappalyzer) webanalyze -host https://target.com -crawl 2 # Nuclei technology detection nuclei -u https://target.com -t technologies/ nuclei -l live_hosts.txt -t technologies/ -o tech_nuclei.txt # Retire.js - find vulnerable JavaScript libraries retire --js --jspath /path/to/js/files retire --node --path /path/to/node/project
4. CMS Detection
# WordPress detection # Indicators: # /wp-content/, /wp-includes/, /wp-admin/ # /xmlrpc.php, /wp-login.php # <meta name="generator" content="WordPress 6.x"> # /wp-json/wp/v2/ curl -s https://target.com | grep -i "wp-content\|wordpress" curl -s https://target.com/wp-json/wp/v2/ | head -5 curl -s https://target.com/readme.html | grep -i version # WPScan - comprehensive WordPress scanner wpscan --url https://target.com --enumerate ap,at,u,dbe wpscan --url https://target.com --api-token $WPSCAN_TOKEN -e vp,vt # Drupal detection # /core/, /modules/, /themes/, /sites/default/ # CHANGELOG.txt, /core/CHANGELOG.txt # X-Generator: Drupal curl -s https://target.com/CHANGELOG.txt | head -5 curl -s https://target.com/core/CHANGELOG.txt | head -5 # Droopescan droopescan scan drupal -u https://target.com # Joomla detection # /administrator/, /components/, /modules/ # /configuration.php~, /README.txt # <meta name="generator" content="Joomla!"> curl -s https://target.com | grep -i "joomla" curl -s https://target.com/administrator/manifests/files/joomla.xml | grep version # JoomScan joomscan -u https://target.com # Magento # /skin/, /js/, /app/, /var/ # /downloader/, /admin/ magescan scan:all https://target.com # SharePoint # /_layouts/, /_vti_bin/ # /_api/web curl -s https://target.com/_api/web | head -5
5. JavaScript Framework Detection
# Check page source for framework indicators curl -s https://target.com | grep -oiE "(react|angular|vue|next|nuxt|svelte|ember|backbone)" # React indicators # data-reactroot, data-reactid # __NEXT_DATA__ (Next.js) # react-app (class names) # /_next/ paths (Next.js) # Angular indicators # ng-version, ng-app, ng-controller # <app-root>, angular.json # /assets/ structure # Vue.js indicators # data-v-xxxx attributes # __vue__, v-bind, v-if, v-for # /static/js/chunk-vendors (Vue CLI) # /_nuxt/ (Nuxt.js) # Check JavaScript files curl -s https://target.com | grep -oP 'src="[^"]*\.js"' | head -20 # Analyze bundled JavaScript curl -s https://target.com/static/js/main.xxxxx.js | grep -oE "(React|Angular|Vue|jQuery)" | sort -u # Source map discovery (development builds) curl -sI https://target.com/static/js/main.js | grep -i "sourcemap" curl -s https://target.com/static/js/main.js.map # Webpack bundle analyzer # Look for /static/js/*.chunk.js patterns # __webpack_require__ in JS files
6. Server-Side Technology Detection
# PHP detection curl -sI https://target.com/index.php curl -s https://target.com/phpinfo.php # common misconfiguration # X-Powered-By: PHP/x.x # PHPSESSID cookie # Java/Spring detection # JSESSIONID cookie # /actuator endpoints (Spring Boot) curl -s https://target.com/actuator/health curl -s https://target.com/actuator/env curl -s https://target.com/actuator/info # ASP.NET detection # .aspx, .ashx, .asmx extensions # ASP.NET_SessionId cookie # X-AspNet-Version header # __VIEWSTATE in form # Python/Django detection # csrftoken + sessionid cookies # /admin/ (Django admin) # "csrfmiddlewaretoken" in forms # Python/Flask detection # "session" cookie (signed) # Werkzeug debugger: /console curl -s https://target.com/console # Ruby on Rails detection # _rails_session cookie # X-Request-Id header # /assets/ pipeline # "authenticity_token" in forms # Node.js/Express detection # connect.sid cookie # X-Powered-By: Express # ETag format differences # Go detection # No session by default # Specific error page formats # Chi/Gin/Echo framework headers
7. Web Server Fingerprinting
# Apache version and modules curl -sI https://target.com | grep Server # Apache specifics: /server-status, /server-info curl -s https://target.com/server-status curl -s https://target.com/server-info # Nginx # Server: nginx/1.x # /nginx_status curl -s https://target.com/nginx_status # IIS # Server: Microsoft-IIS/10.0 # /_vti_bin/, /_vti_inf.html # aspnet_client folder # LiteSpeed # Server: LiteSpeed # Tomcat # Server: Apache-Coyote/1.1 # /manager/html (admin) # /host-manager/html # /status (server status) # HTTP methods testing curl -X OPTIONS -sI https://target.com nmap --script http-methods -p 80,443 target.com # 404 page analysis (different servers return different formats) curl -s https://target.com/nonexistent_page_12345 | head -20
8. WAF/CDN Detection
# wafw00f - WAF detection wafw00f https://target.com wafw00f -a https://target.com # test all WAFs # Common WAF indicators: # Cloudflare # Server: cloudflare # cf-ray header, __cfduid cookie # Error page: "Attention Required! | Cloudflare" # AWS WAF # x-amzn-requestid header # 403 with "Request blocked" message # Akamai # Server: AkamaiGHost # X-Akamai-Transformed header # Imperva/Incapsula # X-CDN: Imperva # visid_incap cookie # incap_ses cookie # Sucuri # Server: Sucuri/Cloudproxy # X-Sucuri-ID header # ModSecurity # Server: Apache with mod_security # 403 Forbidden with ModSecurity message # F5 BIG-IP # Server: BigIP # BIGipServer cookie # Persistence cookie: BIGipServer~pool_name # Manual WAF detection curl -s "https://target.com/?id=1' OR 1=1--" -o /dev/null -w "%{http_code}" curl -s "https://target.com/?q=<script>alert(1)</script>" -o /dev/null -w "%{http_code}" # 403/406/429 → WAF likely present
9. Version-Specific Vulnerability Mapping
# Once technologies are identified, search for CVEs # searchsploit (ExploitDB local) searchsploit apache 2.4.49 searchsploit wordpress 5.7 searchsploit jquery 1.6 # Vulners API curl -s "https://vulners.com/api/v3/burp/software/?software=apache&version=2.4.49&type=httpd" # NVD search # https://nvd.nist.gov/vuln/search?query=apache+2.4.49 # GitHub Advisory Database # https://github.com/advisories?query=apache+2.4.49 # Nuclei CVE scanning nuclei -u https://target.com -t cves/ nuclei -u https://target.com -t cves/2023/ -t cves/2024/ # Map identified technologies to known vulns # Create a table: # | Technology | Version | CVEs | Severity | # |--------------|---------|---------------|----------| # | Apache | 2.4.49 | CVE-2021-41773| Critical | # | PHP | 7.4.3 | CVE-2024-... | High | # | jQuery | 1.6.1 | CVE-2020-... | Medium | # | WordPress | 5.7.0 | CVE-2021-... | High |
10. Complete Technology Fingerprinting Script
#!/bin/bash # tech_fingerprint.sh TARGET=$1 OUTDIR="recon/${TARGET}/tech" mkdir -p $OUTDIR echo "=== Technology Fingerprinting: $TARGET ===" # HTTP Headers echo "[*] Analyzing HTTP headers..." curl -sI "https://$TARGET" > $OUTDIR/headers.txt curl -sI "http://$TARGET" >> $OUTDIR/headers.txt 2>/dev/null # Extract key headers grep -iE "(server|x-powered|x-asp|x-generator|x-drupal|x-frame|x-xss|set-cookie|content-security)" \ $OUTDIR/headers.txt > $OUTDIR/key_headers.txt # WhatWeb scan echo "[*] Running WhatWeb..." whatweb -a 3 "https://$TARGET" > $OUTDIR/whatweb.txt 2>/dev/null # Check common CMS paths echo "[*] Checking CMS indicators..." for path in /wp-login.php /wp-json/wp/v2/ /administrator/ /user/login /wp-content/ \ /xmlrpc.php /api /graphql /actuator/health /console /server-status; do code=$(curl -s -o /dev/null -w "%{http_code}" "https://$TARGET$path" 2>/dev/null) if [ "$code" != "404" ] && [ "$code" != "000" ]; then echo "[+] $path → $code" >> $OUTDIR/cms_paths.txt fi done # Check robots.txt and sitemap echo "[*] Checking robots.txt and sitemap..." curl -s "https://$TARGET/robots.txt" > $OUTDIR/robots.txt 2>/dev/null curl -s "https://$TARGET/sitemap.xml" > $OUTDIR/sitemap.xml 2>/dev/null # JavaScript library detection echo "[*] Analyzing JavaScript libraries..." curl -s "https://$TARGET" | grep -oP 'src="[^"]*\.js[^"]*"' > $OUTDIR/js_files.txt # WAF detection echo "[*] Checking for WAF..." wafw00f "https://$TARGET" > $OUTDIR/waf.txt 2>/dev/null echo "[*] Fingerprinting complete. Results in $OUTDIR/"
Task 5
Google Dorking and OSINT
1. Google Dorking Operators
| Operator | Description | Example |
|---|---|---|
| site: | Restrict to domain | site.com |
| inurl: | URL contains string | inurl |
| intitle: | Page title contains | intitle:"login page" |
| intext: | Page body contains | intext:"password" |
| filetype: | Specific file type | filetype |
| ext: | File extension | ext |
| cache: | Cached version | cache.com |
| link: | Pages linking to URL | link.com |
| related: | Related sites | related.com |
| info: | Site information | info.com |
| define: | Definitions | define injection |
| numrange: | Number range | numrange:1000-2000 |
| daterange: | Date range | daterange:2457388-2457491 |
| OR / | Either term | |
| AND | Both terms | admin AND password |
| - | Exclude | site.com -www |
| " " | Exact match | "index of" |
| * | Wildcard | admin*.target.com |
| .. | Number range | 1..100 |
2. Sensitive File Discovery
# Configuration files site:target.com ext:xml | ext:conf | ext:cfg | ext:ini | ext:env | ext:yml | ext:yaml | ext:toml site:target.com filetype:env site:target.com filetype:yml "password" # Database files site:target.com ext:sql | ext:db | ext:sqlite | ext:mdb site:target.com filetype:sql "INSERT INTO" site:target.com filetype:sql "CREATE TABLE" # Backup files site:target.com ext:bak | ext:backup | ext:old | ext:temp | ext:swp | ext:save site:target.com filetype:zip | filetype:tar | filetype:gz | filetype:rar site:target.com inurl:backup # Log files site:target.com ext:log site:target.com filetype:log "error" | "warning" | "fatal" site:target.com filetype:log "password" | "user" # Source code site:target.com ext:php | ext:asp | ext:aspx | ext:py | ext:rb | ext:java site:target.com ext:git site:target.com inurl:.git # Credentials and secrets site:target.com "password" | "passwd" | "pwd" filetype:txt site:target.com "api_key" | "apikey" | "api-key" | "secret_key" site:target.com "aws_access_key_id" | "aws_secret_access_key" site:target.com "BEGIN RSA PRIVATE KEY" | "BEGIN OPENSSH PRIVATE KEY" site:target.com "jdbc:" | "mongodb://" | "mysql://" | "postgresql://" site:target.com filetype:pem | filetype:key | filetype:ppk
3. Admin Panel and Login Page Discovery
# Admin panels site:target.com inurl:admin site:target.com inurl:administrator site:target.com inurl:admin/login site:target.com inurl:cpanel site:target.com inurl:webmin site:target.com intitle:"admin" intitle:"login" site:target.com intitle:"dashboard" inurl:admin site:target.com inurl:wp-admin site:target.com inurl:administrator/index.php # Login pages site:target.com inurl:login | inurl:signin | inurl:auth site:target.com intitle:"login" | intitle:"sign in" site:target.com inurl:user/login site:target.com inurl:account/login # Registration pages site:target.com inurl:register | inurl:signup | inurl:join site:target.com intitle:"register" | intitle:"sign up" | intitle:"create account" # Password reset site:target.com inurl:forgot | inurl:reset | inurl:recover site:target.com intitle:"forgot password" | intitle:"reset password"
4. Vulnerability Discovery Dorks
# SQL Injection indicators site:target.com inurl:id= | inurl:pid= | inurl:category= | inurl:item= site:target.com inurl:".php?id=" site:target.com "sql syntax" | "mysql_fetch" | "unclosed quotation" site:target.com "ORA-" | "Oracle error" site:target.com "Microsoft OLE DB Provider" site:target.com "PostgreSQL query failed" site:target.com "Warning: mysql_" | "Warning: pg_" # XSS indicators site:target.com inurl:q= | inurl:search= | inurl:query= | inurl:keyword= site:target.com inurl:redirect= | inurl:url= | inurl:return= | inurl:next= # Directory listing site:target.com intitle:"index of" site:target.com intitle:"directory listing" site:target.com intitle:"parent directory" # Error messages site:target.com "Fatal error" | "Parse error" | "Syntax error" site:target.com "stack trace" | "traceback" | "debugging" site:target.com "Warning:" "on line" site:target.com intext:"Exception in thread" # Exposed services site:target.com intitle:"phpMyAdmin" site:target.com intitle:"Adminer" site:target.com intitle:"pgAdmin" site:target.com inurl:phpmyadmin site:target.com intitle:"Kibana" site:target.com intitle:"Grafana" site:target.com intitle:"Jenkins" site:target.com intitle:"GitLab"
5. API and Documentation Discovery
# API documentation site:target.com inurl:swagger | inurl:api-docs | inurl:openapi site:target.com filetype:json "openapi" | "swagger" site:target.com intitle:"Swagger UI" site:target.com inurl:graphql | inurl:graphiql site:target.com inurl:api/v1 | inurl:api/v2 | inurl:api/v3 site:target.com filetype:yaml "paths:" "info:" site:target.com inurl:apidocs | inurl:api-reference # Development/staging environments site:target.com inurl:dev | inurl:staging | inurl:test | inurl:uat site:target.com inurl:beta | inurl:alpha | inurl:sandbox site:*.dev.target.com site:*.staging.target.com site:*.test.target.com # Internal documentation site:target.com filetype:pdf "internal" | "confidential" | "draft" site:target.com filetype:doc | filetype:docx "internal use only" site:target.com filetype:xlsx "employee" | "salary" | "password"
6. GitHub OSINT
# GitHub Dorking - search for secrets in code # Organization-wide search # org:target-org password # org:target-org secret # org:target-org api_key # org:target-org token # org:target-org AWS_ACCESS_KEY # org:target-org private_key # org:target-org credentials # Specific file searches # org:target-org filename:.env # org:target-org filename:wp-config.php # org:target-org filename:configuration.php # org:target-org filename:config.py # org:target-org filename:.htpasswd # org:target-org filename:id_rsa # org:target-org filename:shadow # org:target-org filename:credentials # org:target-org filename:docker-compose.yml # Extension searches # org:target-org extension:pem # org:target-org extension:key # org:target-org extension:env # org:target-org extension:sql # Automated GitHub dorking tools # GitDorker python3 GitDorker.py -t $GITHUB_TOKEN -org target-org -d dorks/medium_dorks.txt # truffleHog - find secrets in git history trufflehog github --org=target-org --json > trufflehog_results.json # gitleaks gitleaks detect --source=/path/to/repo --report-path=gitleaks_report.json # git-secrets git secrets --scan # shhgit - find secrets in real-time shhgit --search-query "target.com"
7. Leaked Credentials and Data
# Check breach databases (authorized use only) # Have I Been Pwned API curl -s "https://haveibeenpwned.com/api/v3/breachedaccount/[email protected]" \ -H "hibp-api-key: $HIBP_KEY" # DeHashed # https://dehashed.com - search by domain, email, username # IntelligenceX # https://intelx.io - comprehensive search # Leaked paste search # Pastebin, GitHub Gists, other paste sites # Search: "target.com" on IntelligenceX phonebook # Credential stuffing wordlists # Use discovered emails with common password patterns # Check for password patterns # company123, Company2024!, Target@123 # Season+Year: Summer2024!, Winter2023 # Month+Year: January2024!
8. Social Media OSINT
# LinkedIn reconnaissance # Search: "target company" employees # Extract: names, roles, technologies mentioned # Job postings reveal: tech stack, internal tools, security measures # Name to email generation # [email protected] # [email protected] # [email protected] # [email protected] # Tools: # linkedin2username python3 linkedin2username.py -c "Target Company" -d target.com # CrossLinked crosslinked -f '{first}.{last}@target.com' -t 'Target Company' -j 2 # Email verification # emailhippo.com # hunter.io/email-verifier # Twitter/X OSINT # Search for @target_company # Monitor employee tweets about technology # Search for leaked info: "target.com" "password" # Instagram, Facebook (company pages) # Employee posts revealing office setup, badge designs, etc.
9. OSINT Frameworks and Tools
# theHarvester - multi-source OSINT theHarvester -d target.com -b all -l 500 # Sources: google, bing, linkedin, twitter, virustotal, # certspotter, crtsh, rapiddns, sublist3r, etc. # Recon-ng - OSINT framework recon-ng # [recon-ng][default] > marketplace install all # [recon-ng][default] > workspaces create target # [recon-ng][target] > modules search # [recon-ng][target] > modules load recon/domains-hosts/google_site_web # [recon-ng][target] > options set SOURCE target.com # [recon-ng][target] > run # SpiderFoot - automated OSINT spiderfoot -s target.com -o output.html # Web UI: spiderfoot -l 127.0.0.1:5001 # Maltego - visual link analysis # Community edition available # Transform-based OSINT # OSINT Framework # https://osintframework.com - comprehensive tool directory # Shodan shodan search "hostname:target.com" shodan search "org:\"Target Organization\"" shodan search "ssl.cert.subject.cn:target.com" # Censys # https://search.censys.io
10. Automated OSINT Pipeline
#!/bin/bash # osint_pipeline.sh TARGET=$1 OUTDIR="recon/${TARGET}/osint" mkdir -p $OUTDIR echo "=== OSINT Pipeline: $TARGET ===" # theHarvester echo "[*] Running theHarvester..." theHarvester -d $TARGET -b google,bing,crtsh,virustotal,rapiddns -l 500 \ -f $OUTDIR/harvester 2>/dev/null # Check for exposed git repos echo "[*] Checking for exposed .git..." for sub in $(cat recon/$TARGET/subdomains/all_subdomains.txt 2>/dev/null); do code=$(curl -s -o /dev/null -w "%{http_code}" "https://$sub/.git/config" 2>/dev/null) if [ "$code" = "200" ]; then echo "[+] Exposed .git found: https://$sub/.git/" >> $OUTDIR/exposed_git.txt fi done # Check for sensitive files echo "[*] Checking for sensitive files..." SENSITIVE_PATHS=".env .env.bak wp-config.php.bak .htpasswd .DS_Store config.php.bak web.config.bak phpinfo.php info.php server-status elmah.axd trace.axd .svn/entries crossdomain.xml" for path in $SENSITIVE_PATHS; do code=$(curl -s -o /dev/null -w "%{http_code}" "https://$TARGET/$path" 2>/dev/null) if [ "$code" = "200" ]; then echo "[+] Found: https://$TARGET/$path" >> $OUTDIR/sensitive_files.txt fi done # robots.txt and sitemap analysis echo "[*] Analyzing robots.txt and sitemap..." curl -s "https://$TARGET/robots.txt" > $OUTDIR/robots.txt 2>/dev/null curl -s "https://$TARGET/sitemap.xml" > $OUTDIR/sitemap.xml 2>/dev/null # Extract disallowed paths from robots.txt grep "Disallow:" $OUTDIR/robots.txt 2>/dev/null | awk '{print $2}' > $OUTDIR/disallowed_paths.txt echo "[*] OSINT pipeline complete. Results in $OUTDIR/"
Task 6
Wayback Machine and Historical Analysis
1. Web Archive Sources
# Wayback Machine (archive.org) # The largest web archive with billions of snapshots # waybackurls - fetch URLs from Wayback Machine waybackurls target.com > wayback_urls.txt waybackurls target.com | sort -u | tee wayback_unique.txt # gau (GetAllUrls) - multiple sources # Sources: Wayback Machine, Common Crawl, AlienVault OTX, URLScan gau target.com > gau_urls.txt gau --subs target.com > gau_with_subs.txt gau --providers wayback,commoncrawl,otx,urlscan target.com > gau_all.txt # waymore - comprehensive URL collection waymore -i target.com -mode U -oU waymore_urls.txt # Combine all sources cat wayback_urls.txt gau_urls.txt waymore_urls.txt | sort -u > all_historical_urls.txt echo "[*] Total unique historical URLs: $(wc -l < all_historical_urls.txt)"
2. URL Filtering and Analysis
# Filter by file extension cat all_historical_urls.txt | grep -iE "\.(php|asp|aspx|jsp|jspx|do|action)(\?|$)" > dynamic_pages.txt cat all_historical_urls.txt | grep -iE "\.(js|json|xml|yaml|yml|config|env|bak|sql|log)" > sensitive_files.txt cat all_historical_urls.txt | grep -iE "\.(pdf|doc|docx|xls|xlsx|ppt|pptx|csv)" > documents.txt cat all_historical_urls.txt | grep -iE "\.(zip|tar|gz|rar|7z|bak|backup|old)" > archives.txt # Filter by patterns cat all_historical_urls.txt | grep -iE "(admin|login|dashboard|panel|portal|manage)" > admin_urls.txt cat all_historical_urls.txt | grep -iE "(api|graphql|rest|v1|v2|v3|endpoint)" > api_urls.txt cat all_historical_urls.txt | grep -iE "(upload|file|download|export|import)" > file_handling.txt cat all_historical_urls.txt | grep -iE "(debug|test|dev|staging|internal)" > dev_urls.txt cat all_historical_urls.txt | grep -iE "(config|setup|install|phpinfo|server-status)" > config_urls.txt # Extract parameters cat all_historical_urls.txt | grep "?" | sort -u > parameterized.txt cat parameterized.txt | grep -oP '\?[^&]*' | cut -d= -f1 | sort | uniq -c | sort -rn > param_names.txt # uro - URL deduplication and optimization cat all_historical_urls.txt | uro > optimized_urls.txt # unfurl - parse and extract URL components cat all_historical_urls.txt | unfurl --unique domains > domains.txt cat all_historical_urls.txt | unfurl --unique paths > paths.txt cat all_historical_urls.txt | unfurl --unique keys > parameter_keys.txt
3. Discovering Removed/Hidden Content
# Wayback Machine CDX API # Fetch all snapshots for a URL curl -s "https://web.archive.org/cdx/search/cdx?url=target.com/*&output=json&fl=timestamp,original,statuscode,mimetype&collapse=urlkey" | \ jq -r '.[] | @tsv' > cdx_results.tsv # Find pages that existed before but return 404 now while read url; do current_code=$(curl -s -o /dev/null -w "%{http_code}" "$url" 2>/dev/null) if [ "$current_code" = "404" ]; then echo "[REMOVED] $url" >> removed_pages.txt fi done < dynamic_pages.txt # View historical snapshots # https://web.archive.org/web/2020*/target.com # https://web.archive.org/web/20200101000000*/target.com/admin # Download historical version curl -s "https://web.archive.org/web/2020/https://target.com/admin" > historical_admin.html # Find old API endpoints cat api_urls.txt | while read url; do current_code=$(curl -s -o /dev/null -w "%{http_code}" "$url" 2>/dev/null) echo "$current_code $url" done > api_status_check.txt # Look for removed functionality # Old registration pages, admin panels, debug endpoints # These may still be accessible but hidden from navigation
4. JavaScript File Analysis from Archives
# Extract JavaScript URLs cat all_historical_urls.txt | grep -iE "\.js(\?|$)" | sort -u > js_files.txt # Download current versions while read js_url; do filename=$(echo "$js_url" | md5sum | cut -d' ' -f1).js curl -s "$js_url" -o "js_current/$filename" 2>/dev/null done < js_files.txt # Compare with archived versions # Look for: # - Removed API endpoints # - Changed authentication logic # - Removed debug code # - Hidden admin functionality # - API keys/tokens in old versions # LinkFinder - extract endpoints from JS linkfinder -i https://target.com -o cli > linkfinder_results.txt linkfinder -i https://target.com -d -o linkfinder_output.html # SecretFinder - find secrets in JS SecretFinder -i https://target.com -o cli > secrets_in_js.txt # Analyze JS for sensitive data cat js_current/*.js | grep -oiE "(api[_-]?key|api[_-]?secret|token|password|secret|credential|auth)['\"][:\s]*['\"][^'\"]{8,}" > js_secrets.txt # Find API endpoints in JS cat js_current/*.js | grep -oiE "(https?://[^\s'\"]+|/api/[^\s'\"]+|/v[0-9]/[^\s'\"]+)" | sort -u > js_endpoints.txt
5. Source Code Recovery from Archives
# Common source code leaks in archives # .git directory exposure curl -s "https://web.archive.org/web/2020/https://target.com/.git/config" curl -s "https://web.archive.org/web/2020/https://target.com/.git/HEAD" # .svn directory curl -s "https://web.archive.org/web/2020/https://target.com/.svn/entries" # .env file curl -s "https://web.archive.org/web/2020/https://target.com/.env" # Configuration files curl -s "https://web.archive.org/web/2020/https://target.com/web.config" curl -s "https://web.archive.org/web/2020/https://target.com/wp-config.php" # Check multiple timestamps for each sensitive file SENSITIVE_FILES=".env .git/config wp-config.php web.config .htaccess config.php database.yml settings.py" for file in $SENSITIVE_FILES; do echo "=== $file ===" >> archived_configs.txt curl -s "https://web.archive.org/cdx/search/cdx?url=target.com/$file&output=json" >> archived_configs.txt done
6. Technology Change Tracking
# Track technology changes over time # Useful for understanding the evolution of the target # Check old versions of the site # https://web.archive.org/web/20200101/target.com # https://web.archive.org/web/20210101/target.com # https://web.archive.org/web/20220101/target.com # Extract technology indicators from each snapshot for year in 2019 2020 2021 2022 2023 2024; do echo "=== $year ===" >> tech_timeline.txt content=$(curl -s "https://web.archive.org/web/${year}0601/https://target.com" 2>/dev/null) echo "$content" | grep -oiE "(wordpress|drupal|joomla|react|angular|vue|jquery|bootstrap|laravel|django|rails|express|next|nuxt)" | sort -u >> tech_timeline.txt done # Track subdomain changes # Compare historical subdomain lists with current # New subdomains = potential new attack surface # Removed subdomains = potential takeover candidates
7. Sitemap and Robots.txt History
# Historical robots.txt curl -s "https://web.archive.org/web/2020/https://target.com/robots.txt" curl -s "https://web.archive.org/web/2021/https://target.com/robots.txt" curl -s "https://web.archive.org/web/2022/https://target.com/robots.txt" # Compare robots.txt versions # Removed Disallow entries might reveal hidden paths # that were previously blocked # Historical sitemaps curl -s "https://web.archive.org/web/2020/https://target.com/sitemap.xml" # Extract all paths from historical sitemaps curl -s "https://web.archive.org/web/2020/https://target.com/sitemap.xml" | \ grep -oP '<loc>\K[^<]+' | sort -u > sitemap_2020_paths.txt curl -s "https://target.com/sitemap.xml" | \ grep -oP '<loc>\K[^<]+' | sort -u > sitemap_current_paths.txt # Find pages removed from sitemap comm -23 sitemap_2020_paths.txt sitemap_current_paths.txt > removed_from_sitemap.txt
8. Common Crawl Analysis
# Common Crawl - free web crawl data # https://commoncrawl.org # Search Common Crawl index curl -s "https://index.commoncrawl.org/CC-MAIN-2024-10-index?url=*.target.com&output=json" | \ jq -r '.url' | sort -u > commoncrawl_urls.txt # Download WARC records for specific URLs # Useful for getting full HTTP responses including headers # cdx_toolkit (Python) # pip install cdx_toolkit python3 << 'PYEOF' import cdx_toolkit cdx = cdx_toolkit.CDXFetcher(source='cc') for obj in cdx.iter('target.com/*', limit=1000): print(obj['url']) PYEOF # URLScan.io curl -s "https://urlscan.io/api/v1/search/?q=domain:target.com" | \ jq -r '.results[].page.url' | sort -u > urlscan_urls.txt
9. Parameter Mining from Historical Data
# Extract all unique parameters cat all_historical_urls.txt | grep "?" | \ grep -oP '[?&]\K[^=]+' | sort | uniq -c | sort -rn > all_params.txt # High-value parameters for injection testing # id, user_id, item_id → IDOR # url, redirect, next, return, goto → Open Redirect / SSRF # search, q, query, keyword → XSS / SQLi # file, path, page, template → LFI/RFI # cmd, exec, command → Command Injection # email, username, user → Account Enumeration # Create parameter-based test URLs cat all_historical_urls.txt | grep "?" | \ qsreplace "FUZZ" | sort -u > fuzzable_urls.txt # Arjun - parameter discovery arjun -u https://target.com/page -oJ arjun_params.json arjun -i parameterized.txt -oJ arjun_batch.json # x8 - hidden parameter discovery x8 -u https://target.com/page -w /usr/share/seclists/Discovery/Web-Content/burp-parameter-names.txt
10. Complete Historical Analysis Script
#!/bin/bash # historical_analysis.sh TARGET=$1 OUTDIR="recon/${TARGET}/historical" mkdir -p $OUTDIR/{urls,js,configs,params} echo "=== Historical Analysis: $TARGET ===" # Collect URLs from all sources echo "[1/7] Collecting historical URLs..." waybackurls $TARGET 2>/dev/null > $OUTDIR/urls/wayback.txt gau $TARGET 2>/dev/null > $OUTDIR/urls/gau.txt cat $OUTDIR/urls/*.txt | sort -u > $OUTDIR/urls/all.txt echo "[*] Total URLs: $(wc -l < $OUTDIR/urls/all.txt)" # Filter and categorize echo "[2/7] Categorizing URLs..." cat $OUTDIR/urls/all.txt | grep -iE "\.js(\?|$)" | sort -u > $OUTDIR/js/js_files.txt cat $OUTDIR/urls/all.txt | grep -iE "\.(php|asp|aspx|jsp)(\?|$)" | sort -u > $OUTDIR/urls/dynamic.txt cat $OUTDIR/urls/all.txt | grep -iE "(api|graphql|v[0-9])" | sort -u > $OUTDIR/urls/api.txt cat $OUTDIR/urls/all.txt | grep "?" | sort -u > $OUTDIR/urls/parameterized.txt # Extract parameters echo "[3/7] Extracting parameters..." cat $OUTDIR/urls/parameterized.txt | grep -oP '[?&]\K[^=]+' | \ sort | uniq -c | sort -rn > $OUTDIR/params/all_params.txt # Check for sensitive files in archives echo "[4/7] Checking archived sensitive files..." for file in .env .git/config .git/HEAD wp-config.php web.config .htaccess \ config.php .svn/entries phpinfo.php server-status; do result=$(curl -s -o /dev/null -w "%{http_code}" \ "https://web.archive.org/web/2023/https://$TARGET/$file" 2>/dev/null) if [ "$result" = "200" ]; then echo "[+] Archived: $file" >> $OUTDIR/configs/archived_sensitive.txt fi done # Historical robots.txt echo "[5/7] Checking historical robots.txt..." for year in 2019 2020 2021 2022 2023 2024; do curl -s "https://web.archive.org/web/${year}0601/https://$TARGET/robots.txt" \ > $OUTDIR/configs/robots_${year}.txt 2>/dev/null done # Extract disallowed paths from all versions cat $OUTDIR/configs/robots_*.txt 2>/dev/null | grep "Disallow:" | \ awk '{print $2}' | sort -u > $OUTDIR/configs/all_disallowed.txt # Alive check on interesting URLs echo "[6/7] Checking alive status..." cat $OUTDIR/urls/api.txt | httpx -silent -status-code > $OUTDIR/urls/alive_api.txt 2>/dev/null # Generate report echo "[7/7] Generating summary..." echo "=== Historical Analysis Summary ===" > $OUTDIR/summary.txt echo "Total URLs: $(wc -l < $OUTDIR/urls/all.txt)" >> $OUTDIR/summary.txt echo "JS Files: $(wc -l < $OUTDIR/js/js_files.txt)" >> $OUTDIR/summary.txt echo "Dynamic Pages: $(wc -l < $OUTDIR/urls/dynamic.txt)" >> $OUTDIR/summary.txt echo "API Endpoints: $(wc -l < $OUTDIR/urls/api.txt)" >> $OUTDIR/summary.txt echo "Unique Parameters: $(wc -l < $OUTDIR/params/all_params.txt)" >> $OUTDIR/summary.txt echo "[*] Historical analysis complete. Results in $OUTDIR/" cat $OUTDIR/summary.txt
Task 7
JavaScript File Analysis
1. Why Analyze JavaScript Files
JavaScript files contain: ├── API endpoints and routes ├── Authentication logic (client-side) ├── API keys, tokens, secrets ├── Hidden admin functionality ├── Business logic implementation ├── WebSocket endpoints ├── Internal domain references ├── Debug/development code ├── Third-party integrations └── Source maps (full source code)
2. Finding JavaScript Files
# Extract from HTML source curl -s https://target.com | grep -oP 'src="[^"]*\.js[^"]*"' | sed 's/src="//;s/"//' curl -s https://target.com | grep -oP "src='[^']*\.js[^']*'" | sed "s/src='//;s/'//" # Find JS in multiple pages for page in / /login /dashboard /api /about; do curl -s "https://target.com$page" | grep -oP 'src="[^"]*\.js[^"]*"' done | sort -u > js_urls.txt # Historical JS files cat wayback_urls.txt | grep -iE "\.js(\?|$)" | sort -u > historical_js.txt # getJS - extract JS URLs getJS --url https://target.com --complete > getjs_results.txt # gospider - web spider for JS discovery gospider -s https://target.com -c 10 -d 2 --js > gospider_js.txt # hakrawler - fast web crawler echo https://target.com | hakrawler -js > hakrawler_js.txt # Resolve relative paths to absolute while read js; do case "$js" in http*) echo "$js" ;; //*) echo "https:$js" ;; /*) echo "https://target.com$js" ;; *) echo "https://target.com/$js" ;; esac done < js_urls.txt > js_absolute.txt
3. Endpoint Extraction
# LinkFinder - extract endpoints from JS python3 linkfinder.py -i https://target.com -o cli python3 linkfinder.py -i https://target.com -d -o results.html # full domain python3 linkfinder.py -i /path/to/file.js -o cli # local file # Manual regex extraction curl -s https://target.com/app.js | grep -oP '["'"'"'](/[a-zA-Z0-9_/\-\.]+)["'"'"']' | sort -u # Extract API paths curl -s https://target.com/app.js | grep -oiE '["'"'"'](\/api\/[^"'"'"']+)["'"'"']' | sort -u # Extract full URLs curl -s https://target.com/app.js | grep -oiE 'https?://[^\s"'"'"'<>]+' | sort -u # Extract fetch/axios/XMLHttpRequest calls curl -s https://target.com/app.js | grep -oP 'fetch\(["\x27][^"\x27]+["\x27]' | sort -u curl -s https://target.com/app.js | grep -oP 'axios\.(get|post|put|delete|patch)\(["\x27][^"\x27]+' | sort -u curl -s https://target.com/app.js | grep -oP '\.open\(["\x27](GET|POST|PUT|DELETE)["\x27],\s*["\x27][^"\x27]+' | sort -u # Extract route definitions (React Router, Vue Router, Angular) curl -s https://target.com/app.js | grep -oP 'path:\s*["\x27][^"\x27]+["\x27]' | sort -u curl -s https://target.com/app.js | grep -oP 'route\(["\x27][^"\x27]+["\x27]' | sort -u
4. Secret and Credential Discovery
# SecretFinder python3 SecretFinder.py -i https://target.com/app.js -o cli python3 SecretFinder.py -i https://target.com -e -o results.html # crawl entire domain # Manual regex patterns for secrets JS_FILE="https://target.com/app.js" # API Keys curl -s $JS_FILE | grep -oiE '(api[_-]?key|apikey)["\x27:\s]*["\x27][a-zA-Z0-9_\-]{16,}["\x27]' # AWS Keys curl -s $JS_FILE | grep -oiE '(AKIA[0-9A-Z]{16})' curl -s $JS_FILE | grep -oiE '(aws[_-]?secret[_-]?access[_-]?key)["\x27:\s]*["\x27][^\x27"]{20,}["\x27]' # Google API Key curl -s $JS_FILE | grep -oiE 'AIza[0-9A-Za-z_\-]{35}' # Firebase curl -s $JS_FILE | grep -oiE '(firebase[a-zA-Z]*\.com|firebaseio\.com)' # JWT tokens curl -s $JS_FILE | grep -oiE 'eyJ[a-zA-Z0-9_-]*\.eyJ[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*' # Generic tokens/passwords curl -s $JS_FILE | grep -oiE '(token|password|secret|credential|auth)["\x27:\s]*["\x27][^\x27"]{8,}["\x27]' # Private keys curl -s $JS_FILE | grep -oiE 'BEGIN (RSA |EC |DSA )?PRIVATE KEY' # Slack tokens curl -s $JS_FILE | grep -oiE 'xox[baprs]-[0-9a-zA-Z\-]+' # GitHub tokens curl -s $JS_FILE | grep -oiE 'gh[pousr]_[A-Za-z0-9_]{36,}' # nuclei JS secret detection nuclei -u https://target.com -t exposures/tokens/
5. Source Map Analysis
# Source maps contain the original, unminified source code # Usually at: file.js.map or referenced in the JS file # Check for source map reference in JS curl -s https://target.com/app.js | tail -5 # Look for: //# sourceMappingURL=app.js.map # Check source map header curl -sI https://target.com/app.js | grep -i "sourcemap" # SourceMap: /app.js.map # Download source map curl -s https://target.com/app.js.map -o app.js.map # Common source map paths # /static/js/main.xxxxx.js.map # /assets/app.js.map # /dist/bundle.js.map # /build/static/js/main.chunk.js.map # Extract original source from source map # unwebpack-sourcemap python3 unwebpack_sourcemap.py --make-directory app.js.map output_dir/ # sourcemapper sourcemapper -url https://target.com/app.js.map -output source_code/ # smap - source map extractor smap https://target.com/app.js.map -o source_output/ # After extraction, analyze the full source code # Look for: API endpoints, auth logic, admin routes, hardcoded secrets grep -rn "api" source_output/ grep -rn "password\|secret\|token\|key" source_output/ grep -rn "admin\|isAdmin\|role" source_output/
6. Webpack Bundle Analysis
# Identify Webpack bundles # Look for: __webpack_require__, webpackJsonp, webpackChunkapp # Extract chunk names/IDs curl -s https://target.com/app.js | grep -oP 'webpackChunk[a-zA-Z_]*' # Find all chunks curl -s https://target.com | grep -oP '/static/js/[^"]+\.js' | sort -u # Download all chunks for chunk in $(curl -s https://target.com | grep -oP '/static/js/[^"]+\.js'); do wget -q "https://target.com$chunk" -P webpack_chunks/ done # Analyze chunks for interesting content for f in webpack_chunks/*.js; do echo "=== $f ===" >> webpack_analysis.txt # Routes grep -oP 'path:\s*["'"'"'][^"'"'"']+' "$f" >> webpack_analysis.txt 2>/dev/null # API calls grep -oP '["'"'"']/api/[^"'"'"']+' "$f" >> webpack_analysis.txt 2>/dev/null # Secrets grep -oiP '(api_key|secret|token|password)\s*[:=]\s*["'"'"'][^"'"'"']+' "$f" >> webpack_analysis.txt 2>/dev/null done # webpack-exploder - analyze webpack bundles # Deobfuscate and separate modules
7. Minified/Obfuscated JS Analysis
# Beautify minified JavaScript # js-beautify js-beautify -f minified.js -o beautified.js # Online tools: # https://beautifier.io # https://prettier.io # de4js - JavaScript deobfuscator # Handles: eval, packed, obfuscator.io, JSFuck # Common obfuscation patterns: # eval-based # eval(function(p,a,c,k,e,d){...}) # String array obfuscation # var _0x1234=['api','key','secret']; function _0x5678(_0x1234,_0x9abc){...} # JSFuck # [][(![]+[])[+[]]+(![]+[])[!+[]+!+[]]... # JJencode / AAencode # $=~[];$={___:++$... # Deobfuscation approaches: # 1. Use browser console to execute and capture output # 2. Replace eval() with console.log() to see decoded code # 3. Use AST-based deobfuscation tools # 4. Synchronet deobfuscator # 5. Manual analysis with breakpoints
8. DOM Sink Analysis
# Identify potential DOM XSS sinks in JavaScript # Dangerous sinks to search for: SINKS="innerHTML|outerHTML|document\.write|document\.writeln|eval|setTimeout|setInterval|Function|execScript|\.html\(|\.append\(|\.prepend\(|\.after\(|\.before\(|\.replaceWith\(|location\.href|location\.assign|location\.replace|window\.open|\.src\s*=|\.href\s*=" # Search JavaScript files for sinks curl -s https://target.com/app.js | grep -oiE "$SINKS" | sort | uniq -c | sort -rn # Dangerous sources (user-controllable input): SOURCES="location\.hash|location\.search|location\.href|document\.URL|document\.documentURI|document\.referrer|window\.name|document\.cookie|postMessage|\.value" # Search for sources curl -s https://target.com/app.js | grep -oiE "$SOURCES" | sort | uniq -c | sort -rn # Look for direct source-to-sink flows # Example: document.getElementById('output').innerHTML = location.hash # This is a DOM XSS vulnerability
9. Third-Party Library Analysis
# Identify third-party libraries and versions # Common CDN patterns curl -s https://target.com | grep -oP 'https://cdn[^"'"'"']+' | sort -u curl -s https://target.com | grep -oP 'https://cdnjs[^"'"'"']+' | sort -u curl -s https://target.com | grep -oP 'https://unpkg[^"'"'"']+' | sort -u # Extract library versions from JS files # jQuery curl -s https://target.com/jquery.min.js | head -5 | grep -oP 'v\d+\.\d+\.\d+' # Check for known vulnerable versions # retire.js - identify vulnerable JS libraries retire --jsrepo --js https://target.com # Snyk vulnerability database # https://snyk.io/vuln # Known vulnerable versions: # jQuery < 3.5.0 → XSS vulnerabilities # Angular.js 1.x → template injection, sandbox bypass # Lodash < 4.17.12 → prototype pollution # Moment.js → ReDoS # DOMPurify < 2.0.17 → mXSS bypass # handlebars < 4.7.7 → prototype pollution # Check all libraries with nuclei nuclei -u https://target.com -t technologies/ -t exposures/
10. Complete JS Analysis Pipeline
#!/bin/bash # js_analysis.sh TARGET=$1 OUTDIR="recon/${TARGET}/js_analysis" mkdir -p $OUTDIR/{files,endpoints,secrets,sourcemaps,beautified} echo "=== JavaScript Analysis: $TARGET ===" # Collect JS files echo "[1/6] Collecting JavaScript files..." curl -s "https://$TARGET" | grep -oP 'src="[^"]*\.js[^"]*"' | \ sed 's/src="//' | sed 's/"//' | while read js; do case "$js" in http*) echo "$js" ;; //*) echo "https:$js" ;; /*) echo "https://$TARGET$js" ;; *) echo "https://$TARGET/$js" ;; esac done | sort -u > $OUTDIR/js_urls.txt echo "[*] Found $(wc -l < $OUTDIR/js_urls.txt) JS files" # Download JS files echo "[2/6] Downloading JS files..." while read url; do filename=$(echo "$url" | md5sum | cut -d' ' -f1).js curl -s "$url" -o "$OUTDIR/files/$filename" 2>/dev/null echo "$url → $filename" >> $OUTDIR/files/url_map.txt done < $OUTDIR/js_urls.txt # Extract endpoints echo "[3/6] Extracting endpoints..." for f in $OUTDIR/files/*.js; do grep -oiE '["'"'"'](\/[a-zA-Z0-9_/\-\.]+)["'"'"']' "$f" 2>/dev/null done | sort -u > $OUTDIR/endpoints/all_endpoints.txt # Search for secrets echo "[4/6] Searching for secrets..." for f in $OUTDIR/files/*.js; do # API keys grep -oiE '(api[_-]?key|apikey|api_secret|secret_key|auth_token)["\x27:\s=]*["\x27][a-zA-Z0-9_\-]{16,}["\x27]' "$f" >> $OUTDIR/secrets/api_keys.txt 2>/dev/null # AWS keys grep -oiE 'AKIA[0-9A-Z]{16}' "$f" >> $OUTDIR/secrets/aws_keys.txt 2>/dev/null # JWT grep -oiE 'eyJ[a-zA-Z0-9_-]*\.eyJ[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*' "$f" >> $OUTDIR/secrets/jwt_tokens.txt 2>/dev/null # URLs grep -oiE 'https?://[^\s"'"'"'<>]+' "$f" >> $OUTDIR/secrets/urls.txt 2>/dev/null done # Check for source maps echo "[5/6] Checking for source maps..." while read url; do map_url="${url}.map" code=$(curl -s -o /dev/null -w "%{http_code}" "$map_url" 2>/dev/null) if [ "$code" = "200" ]; then echo "[+] Source map found: $map_url" >> $OUTDIR/sourcemaps/found.txt curl -s "$map_url" -o "$OUTDIR/sourcemaps/$(basename $map_url)" 2>/dev/null fi done < $OUTDIR/js_urls.txt # Summary echo "[6/6] Generating summary..." echo "=== JS Analysis Summary ===" > $OUTDIR/summary.txt echo "JS Files: $(wc -l < $OUTDIR/js_urls.txt)" >> $OUTDIR/summary.txt echo "Endpoints: $(wc -l < $OUTDIR/endpoints/all_endpoints.txt)" >> $OUTDIR/summary.txt echo "Potential Secrets: $(cat $OUTDIR/secrets/*.txt 2>/dev/null | wc -l)" >> $OUTDIR/summary.txt echo "Source Maps: $(cat $OUTDIR/sourcemaps/found.txt 2>/dev/null | wc -l)" >> $OUTDIR/summary.txt cat $OUTDIR/summary.txt
Task 8
API Endpoint Discovery
1. API Discovery Methodology
API Discovery Flow: 1. Documentation → Swagger/OpenAPI, GraphQL introspection 2. JavaScript analysis → fetch/axios calls, route definitions 3. Historical data → Wayback Machine, cached pages 4. Traffic analysis → Proxy interception (Burp Suite) 5. Bruteforcing → wordlist-based endpoint discovery 6. Mobile app analysis → Decompile APK/IPA 7. Error messages → Stack traces revealing routes
2. API Documentation Discovery
# Common API documentation paths PATHS="/swagger /swagger-ui /swagger-ui.html /swagger/index.html /api-docs /api-docs/swagger.json /api/swagger.json /openapi.json /openapi.yaml /openapi/v3/api-docs /v1/api-docs /v2/api-docs /v3/api-docs /docs /docs/api /redoc /api/docs /graphql /graphiql /playground /graphql/playground /_api /api /api/v1 /api/v2 /api/v3 /swagger/v1/swagger.json /swagger/v2/swagger.json /api-documentation /developer /developer/docs /swagger-resources /api/swagger-resources /.well-known/openapi.json /.well-known/openapi.yaml /api/schema /api/openapi /api/spec /documentation /api/documentation /api-explorer /api/explorer" for path in $PATHS; do code=$(curl -s -o /dev/null -w "%{http_code}" "https://target.com$path" 2>/dev/null) if [ "$code" != "404" ] && [ "$code" != "000" ]; then echo "[+] $path → $code" fi done # Download and parse Swagger/OpenAPI spec curl -s https://target.com/swagger.json | jq '.paths | keys[]' | sort curl -s https://target.com/openapi.json | jq '.paths | keys[]' | sort # Extract all endpoints from OpenAPI spec curl -s https://target.com/swagger.json | jq -r '.paths | to_entries[] | .key as $path | .value | to_entries[] | "\(.key | ascii_upcase) \($path)"'
3. GraphQL Endpoint Discovery
# Common GraphQL paths GRAPHQL_PATHS="/graphql /graphiql /v1/graphql /v2/graphql /api/graphql /query /gql /graphql/console /playground /graphql/playground /altair /api/graphiql /graphql-explorer" for path in $GRAPHQL_PATHS; do code=$(curl -s -o /dev/null -w "%{http_code}" -X POST \ -H "Content-Type: application/json" \ -d '{"query":"{__typename}"}' \ "https://target.com$path" 2>/dev/null) if [ "$code" = "200" ]; then echo "[+] GraphQL endpoint: $path" fi done # GraphQL introspection query curl -s -X POST -H "Content-Type: application/json" \ -d '{"query":"{ __schema { types { name fields { name type { name kind ofType { name } } } } } }"}' \ https://target.com/graphql | jq . # Full introspection curl -s -X POST -H "Content-Type: application/json" \ -d '{"query":"query IntrospectionQuery { __schema { queryType { name } mutationType { name } subscriptionType { name } types { ...FullType } directives { name description locations args { ...InputValue } } } } fragment FullType on __Type { kind name description fields(includeDeprecated: true) { name description args { ...InputValue } type { ...TypeRef } isDeprecated deprecationReason } inputFields { ...InputValue } interfaces { ...TypeRef } enumValues(includeDeprecated: true) { name description isDeprecated deprecationReason } possibleTypes { ...TypeRef } } fragment InputValue on __InputValue { name description type { ...TypeRef } defaultValue } fragment TypeRef on __Type { kind name ofType { kind name ofType { kind name ofType { kind name ofType { kind name ofType { kind name ofType { kind name ofType { kind name } } } } } } } }"}' \ https://target.com/graphql > introspection.json # GraphQL Voyager - visualize schema # https://graphql-kit.com/graphql-voyager/ # InQL - Burp Suite extension for GraphQL # Automatically discovers queries, mutations, subscriptions # clairvoyance - GraphQL field bruteforcing (when introspection is disabled) python3 clairvoyance.py -w wordlist.txt -d https://target.com/graphql
4. REST API Endpoint Bruteforcing
# Wordlist-based API discovery ffuf -u https://target.com/api/FUZZ -w /usr/share/seclists/Discovery/Web-Content/api/api-endpoints.txt -mc 200,201,204,301,302,401,403 # Common API versioning patterns for ver in v1 v2 v3 v4; do ffuf -u "https://target.com/api/$ver/FUZZ" \ -w /usr/share/seclists/Discovery/Web-Content/common.txt \ -mc 200,201,204,301,302,401,403 -o "api_${ver}.json" done # API-specific wordlists # /usr/share/seclists/Discovery/Web-Content/api/ # /usr/share/seclists/Discovery/Web-Content/api/api-endpoints.txt # /usr/share/seclists/Discovery/Web-Content/api/api-seen-in-wild.txt # /usr/share/seclists/Discovery/Web-Content/api/objects.txt # /usr/share/seclists/Discovery/Web-Content/api/actions.txt # Kiterunner - API-aware content discovery kr scan https://target.com -w routes-large.kite -x 20 kr scan https://target.com -A=apiroutes-210228:20000 # Try different HTTP methods for method in GET POST PUT DELETE PATCH OPTIONS HEAD; do code=$(curl -s -o /dev/null -w "%{http_code}" -X $method https://target.com/api/users 2>/dev/null) echo "$method /api/users → $code" done # Content-Type variations curl -s -X POST -H "Content-Type: application/json" -d '{}' https://target.com/api/users curl -s -X POST -H "Content-Type: application/xml" -d '<user/>' https://target.com/api/users curl -s -X POST -H "Content-Type: application/x-www-form-urlencoded" -d 'test=1' https://target.com/api/users
5. WADL and WSDL Discovery
# WADL (Web Application Description Language) - REST WADL_PATHS="/application.wadl /api/application.wadl /services/application.wadl /rest/application.wadl" for path in $WADL_PATHS; do code=$(curl -s -o /dev/null -w "%{http_code}" "https://target.com$path") if [ "$code" = "200" ]; then echo "[+] WADL found: $path" curl -s "https://target.com$path" > wadl.xml fi done # WSDL (Web Services Description Language) - SOAP WSDL_PATHS="?wsdl ?WSDL /services?wsdl /ws?wsdl /service?wsdl /api?wsdl /webservice?wsdl" for suffix in $WSDL_PATHS; do code=$(curl -s -o /dev/null -w "%{http_code}" "https://target.com$suffix") if [ "$code" = "200" ]; then echo "[+] WSDL found: $suffix" curl -s "https://target.com$suffix" > wsdl.xml fi done # Parse WSDL for operations curl -s "https://target.com?wsdl" | grep -oP 'name="[^"]*"' | sort -u
6. Mobile App API Extraction
# Android APK analysis # Decompile APK apktool d target-app.apk -o decompiled/ jadx target-app.apk -d jadx_output/ # Search for API endpoints in decompiled code grep -rn "http://" jadx_output/ | grep -v ".png\|.jpg\|.gif" grep -rn "https://" jadx_output/ | grep -v ".png\|.jpg\|.gif" grep -rn "api" jadx_output/ --include="*.java" --include="*.xml" --include="*.json" # Find API base URLs grep -rn "BASE_URL\|base_url\|API_URL\|api_url\|SERVER_URL" jadx_output/ # iOS IPA analysis # Unzip IPA unzip target-app.ipa -d ipa_contents/ # Use class-dump, Hopper, or IDA for binary analysis # strings extraction strings decompiled/classes.dex | grep -iE "https?://" | sort -u # Search for API keys grep -rn "api_key\|apikey\|api-key\|secret\|token" jadx_output/ --include="*.java" --include="*.xml" # Network Security Config (Android) cat decompiled/res/xml/network_security_config.xml # Check for cleartext traffic allowed, custom trust anchors # MobSF - Mobile Security Framework (automated analysis) # docker run -it --rm -p 8000:8000 opensecurity/mobile-security-framework-mobsf
7. Proxy-Based API Discovery
# Burp Suite approach: # 1. Configure proxy # 2. Browse the application thoroughly # 3. Check Proxy → HTTP History → filter by API paths # 4. Use Target → Site map to see all discovered endpoints # 5. Export endpoints from site map # Mitmproxy mitmproxy --mode regular -p 8080 # Navigate the application, then analyze captured traffic # mitmproxy: press 'z' to clear, 'f' to filter # mitmproxy dump mitmdump -w api_traffic.flow # Later analysis mitmproxy -r api_traffic.flow # ZAP Spider + AJAX Spider # Automated crawling discovers API endpoints # ZAP → Tools → Spider → Target URL → Start # ZAP → Tools → AJAX Spider → Target URL → Start # Extract unique API paths from proxy history # Burp: Extensions → Logger++ → Export # Filter: URL matches regex /api/
8. Error-Based API Discovery
# Trigger error messages that reveal API routes # Force 405 Method Not Allowed curl -s -X DELETE https://target.com/api/ | head -20 # Response may list allowed methods # Request non-existent endpoint curl -s https://target.com/api/nonexistent12345 # Framework may reveal route patterns in error # Debug mode endpoints curl -s https://target.com/api/debug curl -s https://target.com/debug/routes curl -s https://target.com/_debug curl -s https://target.com/routes # Framework-specific route listing # Laravel: /api/routes (if debug enabled) # Django: / (with DEBUG=True shows all URL patterns) # Express: custom debug middleware # Spring Boot: /actuator/mappings curl -s https://target.com/actuator/mappings | jq '.contexts[].mappings.dispatcherServlets[][].details.requestMappingConditions.patterns[]' # Flask: /__debugger__ curl -s https://target.com/__debugger__ # Rails: /rails/info/routes (development mode) curl -s https://target.com/rails/info/routes # Send malformed requests curl -s -X POST https://target.com/api/ -H "Content-Type: application/json" -d '{invalid}' # Error may reveal framework and routing info
9. API Versioning and Legacy Endpoints
# Check multiple API versions for ver in v0 v1 v2 v3 v4 v5; do for endpoint in users accounts products orders transactions; do code=$(curl -s -o /dev/null -w "%{http_code}" "https://target.com/api/$ver/$endpoint" 2>/dev/null) if [ "$code" != "404" ] && [ "$code" != "000" ]; then echo "[+] /api/$ver/$endpoint → $code" fi done done # Header-based versioning curl -s -H "Accept: application/vnd.api.v1+json" https://target.com/api/users curl -s -H "Accept: application/vnd.api.v2+json" https://target.com/api/users curl -s -H "Api-Version: 1" https://target.com/api/users curl -s -H "Api-Version: 2" https://target.com/api/users curl -s -H "X-API-Version: 2024-01-01" https://target.com/api/users # Query parameter versioning curl -s "https://target.com/api/users?version=1" curl -s "https://target.com/api/users?api_version=2" # Legacy/deprecated endpoints often lack security controls # Try: /api/v0/, /api/beta/, /api/alpha/, /api/internal/ # Try: /api/legacy/, /api/old/, /api/deprecated/
10. Complete API Discovery Script
#!/bin/bash # api_discovery.sh TARGET=$1 OUTDIR="recon/${TARGET}/api" mkdir -p $OUTDIR echo "=== API Discovery: $TARGET ===" # Check common documentation paths echo "[1/5] Checking API documentation paths..." DOC_PATHS="/swagger /swagger-ui /swagger-ui.html /swagger.json /swagger/v1/swagger.json /api-docs /openapi.json /openapi.yaml /v1/api-docs /v2/api-docs /graphql /graphiql /playground /docs /api/docs /redoc /documentation /api-explorer /api/schema /.well-known/openapi.json /actuator/mappings /debug/routes /rails/info/routes" for path in $DOC_PATHS; do code=$(curl -s -o /dev/null -w "%{http_code}" "https://$TARGET$path" 2>/dev/null) if [ "$code" != "404" ] && [ "$code" != "000" ] && [ "$code" != "" ]; then echo "[+] $path → $code" >> $OUTDIR/docs_found.txt fi done echo "[*] Documentation results: $(wc -l < $OUTDIR/docs_found.txt 2>/dev/null || echo 0)" # GraphQL detection echo "[2/5] Testing GraphQL endpoints..." for path in /graphql /api/graphql /v1/graphql /query /gql; do result=$(curl -s -X POST -H "Content-Type: application/json" \ -d '{"query":"{__typename}"}' "https://$TARGET$path" 2>/dev/null) if echo "$result" | grep -q "data"; then echo "[+] GraphQL endpoint: $path" >> $OUTDIR/graphql_endpoints.txt fi done # API versioning check echo "[3/5] Checking API versions..." for ver in v0 v1 v2 v3; do code=$(curl -s -o /dev/null -w "%{http_code}" "https://$TARGET/api/$ver/" 2>/dev/null) if [ "$code" != "404" ] && [ "$code" != "000" ]; then echo "[+] /api/$ver/ → $code" >> $OUTDIR/api_versions.txt fi done # Endpoint bruteforcing echo "[4/5] Bruteforcing API endpoints..." for base in /api /api/v1 /api/v2; do ffuf -u "https://$TARGET${base}/FUZZ" \ -w /usr/share/seclists/Discovery/Web-Content/api/api-endpoints.txt \ -mc 200,201,204,301,302,401,403,405 \ -o $OUTDIR/ffuf_${base//\//_}.json 2>/dev/null done # Method testing on discovered endpoints echo "[5/5] Testing HTTP methods..." if [ -f $OUTDIR/docs_found.txt ]; then while read line; do path=$(echo "$line" | awk '{print $2}') for method in GET POST PUT DELETE PATCH; do code=$(curl -s -o /dev/null -w "%{http_code}" -X $method "https://$TARGET$path" 2>/dev/null) echo "$method $path → $code" >> $OUTDIR/method_testing.txt done done < $OUTDIR/docs_found.txt fi echo "[*] API discovery complete. Results in $OUTDIR/"