Scavenger – Crawler Searching For Credential Leaks On Different Paste Sites

Just the code of my OSINT bot searching for sensitive data leaks on different paste sites.Search terms:credentialsprivate RSA keysWordpress configuration filesMySQL connect stringsonion linkslinks to files hosted inside the onion network (PDF, DOC, DOCX, XLS, XLSX)Keep in mind: This bot is not beautiful. The code is not complete so far. Some parts like integrating the credentials in a database are missing in this online repository. If you want to use this code, feel free to do so. Keep in mind you have to customize things to make it run on your system.IMPORTANTThe bot can be run in two major modes:API modeScraping mode (using TOR)Is highly recommend using the API mode. It is the intended method of scraping pastes from and it is just fair to do so. The only thing you need is a PRO account and whitelist your public IP on their site.To start the bot in API mode just run the program in the following way:python -0However, it is not always possible to use this intended method, as you might be in NAT mode and therefore you do not have an IP exclusively (whitelisting your IP is not reasonable here). That is the reason beacuse is implemented a scraping mode where fast TOR cycles in combination with reasonable user agents are used to avoid IP blocking and Cloudflare captchas.To start the bot in scraping mode run it in the following way:python -1Important note: you need the TOR service installed on your system listening on port 9050. Additionally you need to add the following line to your /etc/tor/torrc file.MaxCircuitDirtiness 30This sets the maximum cycle time of TOR to 30 seconds.UsageTo learn how to use the software you just need to call the script with the -h/–help argument.python -hOutput: _________ / _____/ ____ _____ ___ __ ____ ____ ____ ___________ \_____ \_/ ___\\__ \\ \/ // __ \ / \ / ___\_/ __ \_ __ \ / \ \___ / __ \\ /\ ___/| | \/ /_/ > ___/| | \//_______ /\___ >____ /\_/ \___ >___| /\___ / \___ >__| \/ \/ \/ \/ \//_____/ \/usage: [-h] [-0] [-1] [-2] [-ps]Control software for the different modules of this paste crawler.optional arguments: -h, –help show this help message and exit -0, –pastebinCOMapi Activate module (using API) -1, –pastebinCOMtor Activate module (standard scraping using TOR to avoid IP blocking) -2, –pasteORG Activate module -ps, –pStatistic Show a simple statistic.So far I only implemented the module and I am working on I will add more modules and update this script over time.Just start the module separately…python P_bot.pyPastes are stored in data/raw_pastes until they are more then 48000. When they are more then 48000 they get filtered, ziped and moved to the archive folder. All pastes which contain credentials are stored in data/files_with_passwordsKeep in mind that at the moment only combinations like USERNAME:PASSWORD and other simple combinations are detected. However, there is a tool to search for proxy logs containing credentials.You can search for proxy logs (URLs with username and password combinations) by using filepython data/raw_pastesIf you want to search the raw data for specific strings you can do it using (really slow).python SEARCHSTRINGTo see statistics of the bot just callpython The file searches a folder (with pastes) for sensitive data like credit cards, RSA keys or mysqli_connect strings. Keep in mind that this script uses grep and therefore is really slow on a big amount of paste files. If you want to analyze a big amount of pastes I recommend an ELK-Stack.python data/raw_pastes There are two scripts which can be used to monitor a specific twitter user. This means every tweet he posts gets saved and every containing URL gets downloaded. To start the stalker just execute the wrapper.python stalk_user_wrapper.pyDownload Scavenger