Detecting Malicious URLs - Part 1

Nowadays, the majority of computer attacks are launched by visiting a malicious webpage. A user can be tricked into voluntarily giving away confidential information on a phishing page or become victim to a drive-by download resulting in a malware infection. An antivirus should therefore not only provide protection from malicious programs but from potentially dangerous websites too.

Antiviruses use different approaches to detecting malware and malicious URLs, such as signature based and heuristic detection. Signatures are used to precisely identify a file or URL whereas heuristic detection calculates the probability of malicious behaviour.

The first approach is more reliable and produces less false positives but is unable to protect against unknown malware – a threat must first be caught and analysed before a malware analyst can create a signature to detect it. On the contrary, heuristic methods are able to catch previously unknown threats based on a list of suspicious criteria, although being of a probabilistic nature, it can be prone to false positives.

When considering malicious URL detection the same signature based (or, blacklisting) and heuristic approaches may be used. To blacklist a URL, it can be investigated by downloading and analysing the site’s content as well as scanning with an antivirus or IDS (Intrusion Detection System ) to help determine maliciousness.

The list of events generated by Suricata IDS when blocking Blackhole exploits:

An example of an IDS report with threats detected using predefined signatures:

A warning generated by Ad-Aware antivirus when visiting a malicious website:

Heuristic methods can be implemented on the client side to verify visited URLs - specially designed algorithms alert the user if the site they are visiting exhibits suspicious/malicious characteristics. The algorithms can be lexical or host-based. The first one analyses lexical features of a URL and warns a user if a URL itself looks like "suspicious". For example, "http://paaypall.5gbfree.com/index.php" or "http://paypal-intern.de/secure/" can be easily identified as the phishing copy of the "paypal" service just by looking at it.

Host-based algorithms collect information about the host and registered domain name of a URL. On the basis of such information, the algorithm can conclude if a webpage is located on a trusted host or not. For example, host-based features describe geographical location, who registered a webpage and when, as well as information about the registrar.

You can see a real-life example below where all websites hosted on the one IP are phishing sites.

We will attempt to demonstrate if it is possible to utilize host-based data further during the series of publications devoted to detection of malicious URLs.

Ultimately, we may conclude that despite having so many ways to verify URLs, so far, no single approach is perfect or enough to guarantee 100% security of your system. As yet, only a combination of complimentary security technologies may give us confidence in our personal security.

  • Back to articles


  • Share this post:    Twitter Facebook