Big Data, Python, and the future of security
More Good vs. Bad
An approach that is now possible because of cheap data storage and processing is to look retroactively for bad traffic. For example, if you archive all your traffic, in theory, you could replay it a week or so later to your antivirus solution. The idea here is that a new virus may not be detected right away, but after a week, your AV solution should have a signature for it. Thus, you could detect the payload and identify the traffic that resulted in the compromise. Something like Bayesian filtering would be valuable for this approach; by eliminating all the known good traffic and logging only the unknown/known bad, you can limit the amount of data you need to store and process.
SELinux and Local Attacks
Another issue involves minimizing the time for correlation of events. That means, if you can determine that bad traffic is bad within a few minutes instead of a few days, you can minimize the impact of the attack and possibly prevent more systems from being compromised. One way to do this is via SELinux violations and other host-based intrusion detection system (HIDS) violations. In general, if you have properly configured SELinux for your applications (which in most cases means using the default profiles), you should get zero violations.
So, if software causes an SELinux violation, in theory, this could mean it has been compromised. In practice, however, it's much more likely to be a false positive, which is a good reason to learn SELinux and update your system policies/file labels as needed. The same goes for syslog entries and other places where applications typically complain about problems. The more sources of information that you can process the better, especially ones with as much metadata as audit logs, syslog, and so on.
Filters and rule-based security haven't worked very well for some time now – email was the first to fall, and network traffic is rapidly becoming more vulnerable. The sheer volume of traffic and new attacks, as well as the encodings and encryption available to attackers, means that machine-based learning is the only practical, long-term option.
- "Big Data Excavation with Apache Hadoop" by Kenneth Geisshirt: http://www.linux-magazine.com/Issues/2012/144/Hadoop
- MongoDB: http://www.mongodb.org/
- scikit-learn – Machine Learning in Python: http://scikit-learn.org/
- mlpy – Machine Learning Python: http://mlpy.sourceforge.net/
Buy this article as PDF
But you can still be a non-voting “individual supporter” if you pay the money
Several current systems could fall victim to the attack
Latest Linux engine comes with better graphics and support for Intel's new power-saving chips.
Hackers send a message of beauty and liberation to server logs
Citrix gets excited about new Pi-Powered XenDesktop client system
Linux on Azure cert heralds a new era for Redmond.
Proposals for presentations at the CeBIT Open Source Forum will be accepted through 24 January 2016.
Adobe looks for a new start; renames its embattled Flash tool.
The Pi's popular Raspbian OS pursues secrecy without entropy.
VMware bids for a stake in the container industry with a bold effort to integrate containers with its classic virtualization system.