Big Data, Python, and the future of security
More Good vs. Bad
An approach that is now possible because of cheap data storage and processing is to look retroactively for bad traffic. For example, if you archive all your traffic, in theory, you could replay it a week or so later to your antivirus solution. The idea here is that a new virus may not be detected right away, but after a week, your AV solution should have a signature for it. Thus, you could detect the payload and identify the traffic that resulted in the compromise. Something like Bayesian filtering would be valuable for this approach; by eliminating all the known good traffic and logging only the unknown/known bad, you can limit the amount of data you need to store and process.
SELinux and Local Attacks
Another issue involves minimizing the time for correlation of events. That means, if you can determine that bad traffic is bad within a few minutes instead of a few days, you can minimize the impact of the attack and possibly prevent more systems from being compromised. One way to do this is via SELinux violations and other host-based intrusion detection system (HIDS) violations. In general, if you have properly configured SELinux for your applications (which in most cases means using the default profiles), you should get zero violations.
So, if software causes an SELinux violation, in theory, this could mean it has been compromised. In practice, however, it's much more likely to be a false positive, which is a good reason to learn SELinux and update your system policies/file labels as needed. The same goes for syslog entries and other places where applications typically complain about problems. The more sources of information that you can process the better, especially ones with as much metadata as audit logs, syslog, and so on.
Filters and rule-based security haven't worked very well for some time now – email was the first to fall, and network traffic is rapidly becoming more vulnerable. The sheer volume of traffic and new attacks, as well as the encodings and encryption available to attackers, means that machine-based learning is the only practical, long-term option.
- "Big Data Excavation with Apache Hadoop" by Kenneth Geisshirt: http://www.linux-magazine.com/Issues/2012/144/Hadoop
- MongoDB: http://www.mongodb.org/
- scikit-learn – Machine Learning in Python: http://scikit-learn.org/
- mlpy – Machine Learning Python: http://mlpy.sourceforge.net/
Buy this article as PDF
Bug database has a bug of its own that could allow an intruder to create an unauthorized account.
Report focuses federal resources on achieving universal Internet access.
Leading browser makers say “no” to porous encryption algorithm
Report from the X-Force group says attackers are using TOR to hide their crimes
Future Firefox extensions will be compatible with Chrome.
Better read this if you bought your computer before 2011
Users should upgrade to the new version as soon as possible
Xen project announces a privilege escalation problem for Qemu host systems
Attackers can compromise an Android phone just by sending a text message
PC vendor will pre-install Ubuntu on portables in India.