Big Data, Python, and the future of security
More Good vs. Bad
An approach that is now possible because of cheap data storage and processing is to look retroactively for bad traffic. For example, if you archive all your traffic, in theory, you could replay it a week or so later to your antivirus solution. The idea here is that a new virus may not be detected right away, but after a week, your AV solution should have a signature for it. Thus, you could detect the payload and identify the traffic that resulted in the compromise. Something like Bayesian filtering would be valuable for this approach; by eliminating all the known good traffic and logging only the unknown/known bad, you can limit the amount of data you need to store and process.
SELinux and Local Attacks
Another issue involves minimizing the time for correlation of events. That means, if you can determine that bad traffic is bad within a few minutes instead of a few days, you can minimize the impact of the attack and possibly prevent more systems from being compromised. One way to do this is via SELinux violations and other host-based intrusion detection system (HIDS) violations. In general, if you have properly configured SELinux for your applications (which in most cases means using the default profiles), you should get zero violations.
So, if software causes an SELinux violation, in theory, this could mean it has been compromised. In practice, however, it's much more likely to be a false positive, which is a good reason to learn SELinux and update your system policies/file labels as needed. The same goes for syslog entries and other places where applications typically complain about problems. The more sources of information that you can process the better, especially ones with as much metadata as audit logs, syslog, and so on.
Filters and rule-based security haven't worked very well for some time now – email was the first to fall, and network traffic is rapidly becoming more vulnerable. The sheer volume of traffic and new attacks, as well as the encodings and encryption available to attackers, means that machine-based learning is the only practical, long-term option.
- "Big Data Excavation with Apache Hadoop" by Kenneth Geisshirt: http://www.linux-magazine.com/Issues/2012/144/Hadoop
- MongoDB: http://www.mongodb.org/
- scikit-learn – Machine Learning in Python: http://scikit-learn.org/
- mlpy – Machine Learning Python: http://mlpy.sourceforge.net/
Buy this article as PDF
New tool will look like GParted but support a wider range of storage technologies.
New public key pinning feature will help prevent man-in-the-middle attacks.
Carnegie Mellon researchers say 3 million pages could fall down the phishing hole in the next year.
The US government rolls new best-practice rules for protecting SSH.
Klaus Knopper announces the latest version of his iconic Live Linux system.
All websites that use these popular CMS tools could be vulnerable to denial of service attacks if users don't install the updates.
According to a report, many potential victims of the Heartbleed attack have patched their systems, but few have cleaned up the crime scene to protect themselves from the effects of a previous intrusion.
DARPA and NICTA release the code for the ultra-secure microkernel system used in aerial drones.
Should you trust an online service to store your online passwords?
New B+ board lets you build cool things without the complication of a powered USB hub.