Big Data, Python, and the future of security

More Good vs. Bad

An approach that is now possible because of cheap data storage and processing is to look retroactively for bad traffic. For example, if you archive all your traffic, in theory, you could replay it a week or so later to your antivirus solution. The idea here is that a new virus may not be detected right away, but after a week, your AV solution should have a signature for it. Thus, you could detect the payload and identify the traffic that resulted in the compromise. Something like Bayesian filtering would be valuable for this approach; by eliminating all the known good traffic and logging only the unknown/known bad, you can limit the amount of data you need to store and process.

SELinux and Local Attacks

Another issue involves minimizing the time for correlation of events. That means, if you can determine that bad traffic is bad within a few minutes instead of a few days, you can minimize the impact of the attack and possibly prevent more systems from being compromised. One way to do this is via SELinux violations and other host-based intrusion detection system (HIDS) violations. In general, if you have properly configured SELinux for your applications (which in most cases means using the default profiles), you should get zero violations.

So, if software causes an SELinux violation, in theory, this could mean it has been compromised. In practice, however, it's much more likely to be a false positive, which is a good reason to learn SELinux and update your system policies/file labels as needed. The same goes for syslog entries and other places where applications typically complain about problems. The more sources of information that you can process the better, especially ones with as much metadata as audit logs, syslog, and so on.

Conclusion

Filters and rule-based security haven't worked very well for some time now  – email was the first to fall, and network traffic is rapidly becoming more vulnerable. The sheer volume of traffic and new attacks, as well as the encodings and encryption available to attackers, means that machine-based learning is the only practical, long-term option.

Infos

  1. "Big Data Excavation with Apache Hadoop" by Kenneth Geisshirt: http://www.linux-magazine.com/Issues/2012/144/Hadoop
  2. MongoDB: http://www.mongodb.org/
  3. scikit-learn – Machine Learning in Python: http://scikit-learn.org/
  4. mlpy – Machine Learning Python: http://mlpy.sourceforge.net/

The Author

Kurt Seifried is an Information Security Consultant specializing in Linux and networks since 1996. He often wonders how it is that technology works on a large scale but often fails on a small scale.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • DATA STORAGE INTRO

    This month we look at filesystems for SSDs and show you how to get connected with a Windows Active Directory file server.

  • Security Lessons

    Building a network flight recorder with Wireshark.

  • Tshark

    The simple and practical Tshark packet analyzer gives precise information about the data streams on the network.

  • Snort

    Search out hidden attacks with the Snort intrusion detection system.

  • KDE Plasma

    We take a peek at how to create your own plasmoids for the latest KDE desktop, giving you the power to build the perfect active desktop environment.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News

njobs Europe
What:
Where:
Country:
Njobs Netherlands Njobs Deutschland Njobs United Kingdom Njobs Italia Njobs France Njobs Espana Njobs Poland
Njobs Austria Njobs Denmark Njobs Belgium Njobs Czech Republic Njobs Mexico Njobs India Njobs Colombia