Firefox Pre-Downloads

December 11th, 2009

Looking over IDS logs I noticed something very odd. A client with a USER-AGENT string indicating Firefox running on a Mac (Say: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5), downloading an EXE from a malicious host. A few options came to mind. User-agent switcher, a virus infection Mac via CrossOver or some weird mix with VMware Fusion were all options. It also occurred to me that the same users, desperate to get their porn fix might actually repeatedly download EXEs on a mac to get their ugh, codec.

None of that felt quite right. Then I remembered an older vulnerability where one could cluster bomb the default download directory with arbitray files — to hope a user might click or that you might trigger a secondary vulnerabilty. I can’t find references to this but I decided to check the default behavior of Firefox. As it turns out, if you click to download an EXE, JS initiaties the download dialog, whatever the file actually starts downloading before you click accept. Thus even if you click no/cancel, the file still downloads. Therefore, the network IDS triggers and you see odd things like Macs downloading EXEs.

Project Epic Fail – Nov 09

December 10th, 2009

Debugging a Xul Javascript Memory Corruption Bug in Firefox 3.5

mjw@cyberwart.com

November 2009

 

 

Introduction

Debugging complex software can be a difficult task. Even software as familiar as Firefox is often black boxes that few understand. This paper describes the process and mistakes I used to investigate a bug I discovered in the PL_DHashTableOperate function. This function is a memory management hash table that is used during garbage collection, among other things, inside XPCOM Cycle Collector.

 

Noticing the bug

This bug wasn’t discovered using elaborate fuzzing techniques. Rather, it was noticed while browsing a friend’s website:

 

The bug only presents in Firefox and only consistently (among systems I’ve tested) on Windows 7 (32 bit). Crashes are more likely when Google Toolbar is installed but that is not a requirement.

The page obviously utilizes a Google Maps and loads a plethora of data as seen above. After many markers are loaded the browser hangs and crashes with a memory read error near NULL. During the crash Firefox consumes in exess of 1.4GB of memory.

 

Debugging the Process

Firefox is a complex multithreaded application. Consuming large quantities of memory and/or crashing does directly point toward the problem. First we must consider our debugging environment. Initially, we were inclined to use Immunity Debugger (immdbg) as the layout and functionality are, in our opinion, superior to other options. Immdbg’s integrated Python is almost a must have. Unfortunately, Immdbg does not support recent debug symbols (PDB files). Given the complexity of Firefox, this is a significant handicap and we therefore opted to use Windbg. First to leverage this, remember to add Firefox’s debug server: http://symbols.mozilla.org/firefox

Next we attach the debugger and identify the runway thread:

Once we’ve identified the runaway thread, we obtain a stack trace.

 

 

The stack trace shows multiple calls to xul!AppendNodeTextNodeContentsRecurse. This seems like a good place to start. Appending generally implies a memory operation and we know our problem is related to heap allocations. Search for Xul, we find the XUL is Mozilla’s XML parsing library. Now we have a parser and a memory problem, which seems even more likely to be buggy.

 

We next use Immdbg to visualize the data being consumed by the thread:

The data corresponds to the restaurant information being loaded by the Google Maps API. This data is in a for loop and is XML. Looking at the Javascript we see:

In particular we note map_sidebar.appendChild() call. Appending map XML data seems very relevant to our bug so we begin testing. Unfortunately, it turns out this isn’t directly related to our bug. We will shortly provide an accurate stack trace of the problem, but our testing yielded some (mildly) notable results.

First, we tried excessively calling the appendChild() function to create a memory fault. This will sometimes cause Firefox to crash, but not with our fault. It also takes excessively long for the crash to occur (likely due to the small amount of data we are appending).

Observers might note the s string we’re building. We also thought we might be able to read out of bounds on markers to obtain unauthorized reads. This didn’t work. Javascript appears to alloc() memory for any marker[i].

Next we tried to manipulate the markers by going out of bounds and by inserting fixed data. This consumed memory but generally didn’t create a real problem.

Again, no notable effect.

 

 

 

Changing Directions

Next we observe the stack traces more thoroughly. We use Windbg’s !analyze and !exploitable features. These help some – mainly analyze.

 

We identify the problem in xul!PL_DHashtable. The problem occurs when ECX is passed in and used as a read address.

The calling function to PL_DHashTableOperate is xul!GCGraphBuilder::NoteRoot

 

A stack argument to NoteRoot is passed to PL_DHashTableOperate which uses it without bounds checking. Oddly, NoteRoot does a check on this value but still calls PL_DHashTableOperate.

Next we research the class a little more:

 

Looking at a chart of the relationships of the class we’re simply bewildered. Look at this:

 

 

How could such a thing not be buggy?

 

 

Tracing the problem is complicated. The libraries use C++, which often mangles call stacks. The objects use inheritance and the Garbage collector itself is a bit outside the scope of any individual class. It’s simply a related class that cleans up stray objects according to a complex tree algorithm.

 

Source

We obtain the source for the functions from Mozilla. The source may not be entirely accurate as it’s not guaranteed to be the build source.

 

 

 

 

AddXPConnectRoots iterates over Javascript contexts using the variable acx. It always calls NoteRoot. NoteRoot appears to do some validation, but looking at the disassembly we know any NULL void * root, should trigger the bug – as anything less than 0×0A should always continue to the vulnerable function. We’re not sure why the assert fails – the compiled code does some checking but not as the souce code indicates. We might very well be looking at different versions of code/binary.

 

 

 

Exploitation – Epic Fail?

Given the above, exploitation should consist of:

  1. Generate objects in javascript.
  2. Create a condition where the object is preselected for cleanup
  3. Trigger the Garbage Collector
  4. Delete the object.
  5. Crash

This is a race condition bug and we have not been able to produce reliable code to exploit it. I’m not sure if I will try further for a DoS. I have seen EIP over written (once in MANY debug iterations). I’ve also seen a few writes. This likely occurs do to quirks in the free-ing process and timing.

The details are complicated heap kung-fu. The GCGraphBuilder class implements the purple portion of the garbage cycle collector. Described as:

     64                 : // Purple nodes are *candidates* for being scanned. They are nodes we

      65                 : // haven't begun scanning yet because they're not old enough, or we're

      66                 : // still partway through the algorithm.

      67                 : //

      68                 : // XPCOM objects participating in garbage-cycle collection are obliged

      69                 : // to inform us when they ought to turn purple; that is, when their

      70                 : // refcount transitions from N+1 -> N, for nonzero N. Furthermore we

      71                 : // require that *after* an XPCOM object has informed us of turning

      72                 : // purple, they will tell us when they either transition back to being

      73                 : // black (incremented refcount) or are ultimately deleted.

 

Continues:

    78                 : // An XPCOM object is either scan-safe or scan-unsafe, purple-safe or

      79                 : // purple-unsafe.

      80                 : //

      81                 : // An object is scan-safe if:

      82                 : //

      83                 : //  - It can be QI'ed to |nsXPCOMCycleCollectionParticipant|, though this

      84                 : //    operation loses ISupports identity (like nsIClassInfo).

      85                 : //  - The operation |traverse| on the resulting

      86                 : //    nsXPCOMCycleCollectionParticipant does not cause *any* refcount

      87                 : //    adjustment to occur (no AddRef / Release calls).

      88                 : //

      89                 : // An object is purple-safe if it satisfies the following properties:

      90                 : //

      91                 : //  - The object is scan-safe.

      92                 : //  - If the object calls |nsCycleCollector::suspect(this)|,

      93                 : //    it will eventually call |nsCycleCollector::forget(this)|,

      94                 : //    exactly once per call to |suspect|, before being destroyed.

      95                 : //

      96                 : // When we receive a pointer |ptr| via

      97                 : // |nsCycleCollector::suspect(ptr)|, we assume it is purple-safe. We

      98                 : // can check the scan-safety, but have no way to ensure the

      99                 : // purple-safety; objects must obey, or else the entire system falls

     100                 : // apart. Don't involve an object in this scheme if you can't

     101                 : // guarantee its purple-safety.

     102                 : //

     103                 : // When we have a scannable set of purple nodes ready, we begin

     104                 : // our walks. During the walks, the nodes we |traverse| should only

     105                 : // feed us more scan-safe nodes, and should not adjust the refcounts

     106                 : // of those nodes.

     107                 : //

     108                 : // We do not |AddRef| or |Release| any objects during scanning. We

     109                 : // rely on purple-safety of the roots that call |suspect| and

     110                 : // |forget| to hold, such that we will forget about a purple pointer

     111                 : // before it is destroyed.  The pointers that are merely scan-safe,

     112                 : // we hold only for the duration of scanning, and there should be no

     113                 : // objects released from the scan-safe set during the scan (there

 

http://people.mozilla.com/~mnandigama/codecoverage_html/xpcom/base/nsCycleCollector.cpp.gcov.html

 

The exact problem occurs here:

PtrInfo*

 GCGraphBuilder::AddNode(void *s, nsCycleCollectionParticipant *aParticipant

                         IF_DEBUG_CC_PARAM(PRUint32 aLangID)

                        )

 {

     PtrToNodeEntry *e = static_cast<PtrToNodeEntry*>(PL_DHashTableOperate(&mPtrToNodeMap, s, PL_DHASH_ADD));

     PtrInfo *result;

     if (!e->mNode) //BUG HERE. e may be null, but we access e+0x4
				

        {

         // New entry.

         result = mNodeBuilder.Add(s, aParticipant

                                   IF_DEBUG_CC_PARAM(aLangID)

                                  );

         if (!result) {

             PL_DHashTableRawRemove(&mPtrToNodeMap, e);

 

Update: This bug – https://bugzilla.mozilla.org/show_bug.cgi?id=502687 also appears that the crash occurs on out of memory (OOM). Our testing and other reports indicate this bug can also appear when NOT OOM.

 

Lessons Learned

Going from random bug to some level of understanding in this case was difficult. A stack trace doesn’t really identify the problem. The source of the problem is in Javascript, which we have a lot of and it’s difficult to pinpoint the exact problem.

We laid out this paper, hopefully clearly. Our testing wasn’t so straightforward. Hopefully our process will help others. We suggest profiling the memory operations using runaway or following the heap to see what particular operations are most likely related to the memory corruption problem. Next do some simple searching in the Javascript to find possibly related strings. Next we suggest getting a solid stack trace and understanding of the disassembly to direct further testing. We largely failed at this. We approached the two ends (javascript and stack trace) as separate objectives and wasted much time.

If we were repeating a javascript bug, we would probably use a proxy such as WebScarab. Several times during the testing it would have been nice to modify javascript from the original offending website. A proxy with data meddling would have been beneficial. Simply copying the page and editing doesn’t work as the app is AJAX oriented.

 

Future

Like COM for windows. Sounds like fun.

Also we know of a bug in Flash that we’d like to leverage similar techniques to exploit.

Finally

 

 

 

Proposed fix (someone posted to Mozilla for bug 502687)

Bug 502687 - GCGraphBuilder::AddNode crashes on OOM.

 

diff --git a/xpcom/base/nsCycleCollector.cpp b/xpcom/base/nsCycleCollector.cpp

--- a/xpcom/base/nsCycleCollector.cpp

+++ b/xpcom/base/nsCycleCollector.cpp

@@ -1323,16 +1323,19 @@ GCGraphBuilder::~GCGraphBuilder()

 }

 

 PtrInfo*

 GCGraphBuilder::AddNode(void *s, nsCycleCollectionParticipant *aParticipant

                         IF_DEBUG_CC_PARAM(PRUint32 aLangID)

                        )

 {

     PtrToNodeEntry *e = static_cast<PtrToNodeEntry*>(PL_DHashTableOperate(&mPtrToNodeMap, s, PL_DHASH_ADD));

+    if (!e)

+        return nsnull;

+

     PtrInfo *result;

     if (!e->mNode) {

         // New entry.

         result = mNodeBuilder.Add(s, aParticipant

                                   IF_DEBUG_CC_PARAM(aLangID)

                                  );

         if (!result) {

             PL_DHashTableRawRemove(&mPtrToNodeMap, e);

References

https://developer.mozilla.org/En/Using_the_Mozilla_symbol_server

http://doxygen.db48x.net/mozilla-full/html/dd/ded/nsCycleCollector_8cpp.html

https://developer.mozilla.org/en/XPConnect

https://developer.mozilla.org/en/Using_XPCOM_in_JavaScript_without_leaking

https://developer.mozilla.org/en/interfacing_with_the_xpcom_cycle_collector

http://en.wikipedia.org/wiki/XUL

https://developer.mozilla.org/En/XUL

https://wiki.mozilla.org/CrashKill/QA

http://doxygen.db48x.net/mozilla-full/html/d1/d9a/classGCGraphBuilder.html#9eaee96ffac342f5df2a84561b481d4a

http://doxygen.db48x.net/comm-central/html/classnsCycleCollectionTraversalCallback.html

https://bugzilla.mozilla.org/show_bug.cgi?id=502687

 

 

 

 

 

What’s with “CyberWART”?

October 9th, 2009

I’m often asked, “What’s the deal with ‘CyberWART’”. The name is meant to provoke mixed reactions. The answer begins with it nominally means Cyber WARfare Technologies. But I also want to convey two other thoughts. First, the WART part – I think the Cyber Warfare as currently discussed is mostly absurd. The dialogs are usually unthoughtful and IMO ultimately a cyber wart. (Yes I like bad puns). The second aspect is the whole site is a bit of a contradiction. Cyberwart is officially a LLC but we have no clients. Yet I still try to publish interesting things. The plan was to publish exploits and more offensive technology — which has been limited as I previously worked on mainly cleared projects, but that is no longer the case. So that’s the insider info on CyberWART.

Cyberwart Trac

October 2nd, 2009

I have this problem… I never feel that code is good enough to be officially released. My code is usually an on-going POC soup that I’m constantly tweaking. As a result, I seldom post code as it’s “not ready yet”. I doubt I’m going to get past that, but I have a solution: Trac.

I’m going to start posting code to http://trac.cyberwart.com. Definitely don’t expect production code, but I like to think my prototypes are sometimes interesting.

SMB2

October 2nd, 2009

I wrote previously about the SMB2 vulnerability affecting Windows 7 and Windows 2008. Immunity successfully developed a working exploit. Later, a working exploit was put into Metasploit. FWIW, the bug isn’t that bad. By default remote access to the SMB service is blocked (for home users). Business are at a bit more risk. However, there’s yet another mitigation. As Dave Aitel writes:

Our assessment is that the exploit works by relying on some key magic
numbers – one of which is what redirects execution to the payload. In
some circumstances, this magic number is always the same – i.e. in
VMWare or in some specific hardware configurations. However, in many
situations (i.e. you don’t have the exact same hardware the exploit
expects) this number will be different, resulting in a bluescreen.

Therefore, there’s a significant hurdle still remaining for the fully public exploit. The bug also requires a lot of technical skill so I don’t imagine criminals will be likely to provide the exploit without decent compensation.

To the point, it’s a very cool bug but not an immediate extraordinary risk to most organizations.

UPDATE:

I might be wrong. I saw this blog post: http://www.abysssec.com/blog/2009/10/exploiting-vista-2008-using-smbv2-exploit/. It looks like he exploited a system in the wild using the Metasploit module. Unless the machine is VMWare deployment and/or someone trying to catch exploits, this seems inconsistent with the time estimate to bypass the magic number.

FFIEC Best Practice Audit

September 22nd, 2009

I saw this article today:

US court rules that bank failed to protect customer against fraud

Dan Raywood | Sep 22, 2009 9:55 AM

Could set precedent for banking sector.

The banking sector could face a major shake-up after a court in the US ruled that a bank failed to protect a user’s account against fraudulent access.

In a recent case, a US judge allowed Marsha and Michael Shames-Yeakel to bring a case against Citizens Financial Bank, who alleged that the bank failed to implement state-of-the-art security technology, as they were the victims of fraud perpetrated through their online bank account to the tune of $US26,500.

The US District Judge refused to grant summary judgement in favour of the financial institution, clearing the way for the court case to take place. In her judgement, Rebecca Pallmeyer stated: “In light of citizens’ apparent delay in complying with FFIEC security standards, a reasonable finder of fact could conclude that the bank breached its duty to protect Plaintiffs’ account against fraudulent access.”

Rik Ferguson, senior security advisor at Trend Micro, claimed that the case could have important ramifications across the US. He highlighted a 2005 FFIEC report entitled ‘Authentication in an internet banking environment’, that stated: “The agencies consider single-factor authentication, as the only control mechanism, to be inadequate for high-risk transactions involving access to customer information or the movement of funds to other parties.”

Ferguson said: “The sheer volume of personal banking data and the ease with which it can be accessed is staggering. Don’t for a moment think that cost or lack of skill is a barrier to entry into the shady world of ‘carding’ and online financial fraud.

“Logon details for online banking are usually sold priced as a percentage of the available balance on the account. Today, bank accounts are available online for as little as three per cent including personal, business and offshore accounts.”

He claimed that online banking in the US still tends to rely on simple username and password combinations, and in the rare cases where a confirmation number is required, this is often sent to the customer’s email account, which is also easy for a criminal to compromise.

US banks have traditionally used single factor authentication, while in Europe, two-factor authentication has been common for years.

Ferguson said: “The deployment of these kinds of technologies in Europe, along with the language issues, means that the US is considered ‘low-hanging fruit’ for online banking fraud, and until financial institutions invest in the necessary deterrent technology, it will remain so.

“That being said, though, two-factor authentication technology may not be familiar to even some European banking customers, because (as was the case with chip and PIN cards) certain European countries have also been guilty of tardiness in deploying security technologies for online banking. So, if your bank doesn’t require this additional security, you can bet that cybercriminals know this and that your bank and your account will be targets.”

He further claimed that it is worth remembering that you should not always rely on the goodwill of your financial institution to reimburse you for losses to cybercrime.

“An argument I have heard time and again from friends and acquaintances is ‘Why should I worry when the bank always reimburse any losses?’ If the losses to cybercrime ever become too much for UK banks for example, they can fall back on the provisions of their Banking Code which states ‘If you act without reasonable care, and this causes losses, you may be responsible for them’,” said Ferguson.

As a result I’m sure the big four auditor and all the various consultancies will begin offering FFIEC Best Practice Audits. They might come up with a better name, but that’s basically what they’re selling. I guess it’s a good day to be a pen tester.

Woe unto Microsoft

September 16th, 2009

From DailyDave:

http://www.immunityinc.com/ceu-index.shtml has a nice reliable remote
Vista SP2 SMBv2 exploit now. I guess one thing that always strikes me
about this sort of thing is that while most people associate exploits to
a single person, the reality of the situation is that you want a pretty
large team working on it.

In this case we had Skylar Rampersaud, Nicolas Pouvesle, Gustavo Scotti,
and last but not least, Kostya Kortchinsky.

Or as I used to say at the Fort, “One team, one parking lot.” :>

Thanks,
Dave Aitel
Immunity, Inc.

More info at:

  1. http://expertmiami.blogspot.com/2009/09/smbv2-exploit-video.html
  2. http://vrt-sourcefire.blogspot.com/2009/09/smbv2-quotes-dos-quotes.html

Dealing with Cyber Crime

September 11th, 2009

Recently a family member discovered an unknown charge of approximately $2000 on her debit card. The charge was from Amazon business services and was otherwise myterious. Amazon wasn’t overly communicative but after several days they finally got back with an informative email. They wrote:

Thank you for contacting us at Amazon.com.

I have investigated the charge made to your credit card and confirmed that it was the result of the unauthorized use of your credit card number.  Please note that we have closed the Amazon.com account through which the unauthorized purchase was made.

To receive a refund, please contact the bank that issued your credit card and report the charge as unauthorized.  The bank will send paperwork for you to sign to verify any unauthorized charge.  Your bank will then pass the appropriate paperwork on to us.

It may be helpful for you to know that under U.S. banking regulations (Regulation Z – the Truth in Lending Act), in most cases, you have the right to withhold the unauthorized charge from payments to your bank until the matter is resolved.  In addition, your bank must resolve the matter within 90 days of its receipt of notice from you.

We recommend that you have this card cancelled and reissued to prevent any future unauthorized use.  We also encourage you to report the crime to your local law enforcement officials.  Although we are not permitted to provide you with any further details about the unauthorized charge to your card, we will gladly cooperate with your bank or local law enforcement, should they contact us.

Again, thank you for contacting us at Amazon.com.

Best regards,

Ultimately, I find this email infuriating. Without evidence to the contrary, Amazon inappropriately charged the money. Their solution, is basically to complain to the bank. Now, the bank readily resolved the issue, but my concern is that this is Amazon’s mistake and they seem to take little responsibility. Further, they write: “Although we are not permitted to provide you with any further details about the unauthorized charge to your card, we will gladly cooperate with your bank or local law enforcement, should they contact us.”

They’re not permitted to provide you with further details? Seriously? Their company makes an illegal $2000 charge and they think they don’t have to own up to what happened?

Righteous indignation doesn’t go very far, so I thought I’d research what actual recourse I might have. Several options came to mind:

  1. Talk to local police and pursue charges against Amazon. This is viable, as a crime has been committed and the only available evidence points toward Amazon. I don’t think Amazon intentionally commited a crime, but since they’re not owning up otherwise it’s an option. The problem is that local police are often barely computer literate.
  2. Option 2, file a civil suit for financial losses associated with the incident. They money itself is returned, but there could be over draft fees or costs associated with needing credit monitoring. One could always sue to recover those expenses.

I think both options are potentially viable, but they’re also distasteful and seem excessive to resolve what should be a simple problem: We want more information on what happened, what data was compromised, and why it was compromised.

I think Maryland has come to the rescue. I found the Maryland Consumer Protection – Personal Information Act of 2007. This act basically defines data destruction and compromise rules. In particular it has the following clauses:

  •  
    • TO PROTECT PERSONAL INFORMATION FROM UNAUTHORIZEDACCESS, USE, MODIFICATION, OR DISCLOSURE, A BUSINESS THAT OWNS ORLICENSES PERSONAL INFORMATION OF AN INDIVIDUAL RESIDING IN THE STATESHALL IMPLEMENT AND MAINTAIN REASONABLE SECURITY PROCEDURES ANDPRACTICES THAT ARE APPROPRIATE TO THE NATURE OF THE PERSONALINFORMATION OWNED OR LICENSED AND THE NATURE AND SIZE OF THEBUSINESS AND ITS OPERATIONS.
    •  

    • (B) (1) A BUSINESS THAT OWNS OR LICENSES COMPUTERIZED DATATHAT INCLUDES PERSONAL INFORMATION OF AN INDIVIDUAL RESIDING IN THE STATE, WHEN IT DISCOVERS OR IS NOTIFIED OF A BREACH OF THE SECURITY OF A SYSTEM, SHALL CONDUCT IN GOOD FAITH A REASONABLE AND PROMPT INVESTIGATION TO DETERMINE THE LIKELIHOOD THAT PERSONAL INFORMATION OF THE INDIVIDUAL HAS BEEN OR WILL BE MISUSED AS A RESULT OF THE BREACH.
    •  

    • (2) EXCEPT AS PROVIDED IN SUBSECTION (D) OF THIS SECTION, THE NOTIFICATION REQUIRED UNDER PARAGRAPH (1) OF THIS SUBSECTION SHALL BE GIVEN AS SOON AS REASONABLY PRACTICABLE AFTER THE BUSINESS DISCOVERS OR IS NOTIFIED OF THE BREACH OF THE SECURITY OF A SYSTEM.
    •  

    • (3) A BUSINESS THAT IS REQUIRED TO NOTIFY AN OWNER OR LICENSEE OF PERSONAL INFORMATION OF A BREACH OF THE SECURITY OF A SYSTEM UNDER PARAGRAPH (1) OF THIS SUBSECTION SHALL SHARE WITH THE OWNER OR LICENSEE INFORMATION RELATIVE TO THE BREACH.

The full text is available at: http://mlis.state.md.us/2007RS/chapters_noln/Ch_532_hb0208E.pdf

It’s important to obtain this information as there’s no reasonable way to to mitigate the problem otherwise. Yes, the Amazon account was close and the credit card has been changed, but some attack vector enabled the compromised and that needs to be fixed. Without more information, we have no idea if the fault was a bot on a home pc, a guessed password, XSS, SQL Injection of amazon, a reused password, or some other problem. The plan now is to reply to Amazon citing the above Act and requesting all relevant information.

Quicksort Beaten

September 11th, 2009

True geeks should appreciate this from:

http://permalink.gmane.org/gmane.comp.java.openjdk.core-libs.devel/2628

Hello All,

I'd like to share with you new Dual-Pivot Quicksort which is
faster than the known implementations (theoretically and
experimental). I'd like to propose to replace the JDK's
Quicksort implementation by new one.

Description
-----------
The classical Quicksort algorithm uses the following scheme:

1. Pick an element P, called a pivot, from the array.
2. Reorder the array so that all elements, which are less than
    the pivot, come before the pivot and all elements greater than
    the pivot come after it (equal values can go either way). After
    this partitioning, the pivot element is in its final position.
3. Recursively sort the sub-array of lesser elements and the
    sub-array of greater elements.

The invariant of classical Quicksort is:

[ <= p | >= p ]

There are several modifications of the schema:

[ < p | = p | > p ]  or  [ = p | < p | > p | = p ]

But all of them use *one* pivot.

The new Dual-Pivot Quicksort uses *two* pivots elements in this manner:

1. Pick an elements P1, P2, called pivots from the array.
2. Assume that P1 <= P2, otherwise swap it.
3. Reorder the array into three parts: those less than the smaller
    pivot, those larger than the larger pivot, and in between are
    those elements between (or equal to) the two pivots.
4. Recursively sort the sub-arrays.

The invariant of the Dual-Pivot Quicksort is:

[ < P1 | P1 <= & <= P2 } > P2 ]

The new Quicksort is faster than current implementation of Quicksort
in JDK (L. Bentley and M. Douglas McIlroy) and classical Quicksort.

The full description of the Dual-Pivot Quicksort you can find
on my page: http://iaroslavski.narod.ru/quicksort

Performance tests
-----------------
Here is result of running on different types of input data:

Client VM                          all    85%  organ  0..1          0..4
              random ascend descend equal  equal pipes random 010101 random
Dual-Pivot:  16.83  5.31    5.47   0.35  0.68  10.59  1.06   1.02   2.18
   Bentley's:  19.77  9.08   10.13   0.63  1.12  13.22  1.63   1.08   2.49

Server VM                          all    85%  organ  0..1          0..4
              random ascend descend equal  equal pipes random 010101 random
Dual-Pivot:  23.94  6.68    6.63   0.43  0.62  17.14  1.42   1.96   3.41
   Bentley's:  25.20 10.18   10.32   2.07  1.33  16.72  2.95   1.82   3.39

The a lot of other tests have been run under client and server mode.
The most interesting is BentleyBasher test framework. It runs battery
of tests for all cases:

{ 100, 1000, 10000, 1000000 } x
{ sawtooth, rand, stagger, plateau, shuffle } x
{ ident, reverse, reverse_front, reverse_back, sort, dither}

where

100, ... , 1000000 - array length

sawtooth: x[i] =i%m
rand: x[i] = rand() % m
stagger: x[i] = (i*m + i) % n
plateau: x[i] = min(i, m)
shuffle: x[i] = rand()%m? (j+=2): (k+=2)

ident(x) - a copy of x
reverse(x, 0, n) - reversed copy
reverse_front(x, 0, n/2) - front half reversed
reverse_back(x, n/2, n) - back half reversed
sort(x) - an ordered copy
dither(x) - add i%5 to x[i]

Here is the result of execution:
Server VM: http://spreadsheets.google.com/pub?key=t_EAWUkQ4mD3BIbOv8Fa-AQ&output=html
Client VM: http://spreadsheets.google.com/pub?key=tdiMo8xleTxd23nKUObcz0Q&single=true&gid=0&output=html

Mathematical investigations
---------------------------
It is proved that for the Dual-Pivot Quicksort the average number of
comparisons is 2*n*ln(n), the average number of swaps is 0.8*n*ln(n),
whereas classical Quicksort algorithm has 2*n*ln(n) and 1*n*ln(n)
respectively. Full mathematical proof see in attached proof.txt
and proof_add.txt files. Theoretical results are also confirmed
by experimental counting of the operations.

Diff between current and new implementation of Quicksort
--------------------------------------------------------

Here is the link to the diff for java.util.Arrays class:
http://cr.openjdk.java.net/~alanb/6880672/webrev.00

If you like to look and play with new algorithm,
please, take attached class DualPivotQuicksort.java

Feedback
--------

Also I'd like to share a feedback from Joshua Bloch and
Jon Bentley who spent a lot of time investigating this
algorithm, who gave me many advices and tips how to
make new Quicksort better.

Simply awesome!

Statistical Traffic Volume Analysis

September 11th, 2009

IDS systems are fragile. As a penetration tester, I have flown under the radar countless times. Most IDS systems depend on known bad signatures (which are often poorly written). They look for things like large NOP sleds, malicious EXEs, IRC traffic, and SYN scans. Some systems are much better, but many aren’t.

Once an attacker has access to a system, he needs to maintain access. My favorite technique for accomplishing this is to spawn a new thread by injecting it into Iexplorer. This works well, as all Windows systems have Internet Explorer and in all but the fewest cases, it has ready Internet access. From there, it’s easy enough to use Microsoft’s API to communicate via SSL to a remote C&C webserver. In 5 years of penetration testing, I’ve never seen this detected.

How do you detect C&C from the network when the traffic is inside a SSL tunnel? There’s no good solution to this problem. However several ideas come to mind:

  1. Create netflows for all traffic. Measure the sent and received data. Use statistics to find anomalous traffic. In general, botnets will often send more data than typical web traffic. Compute statistics on the traffic and identify anomalies that need further research. This has been previously discussed. The buzz term is statistical traffic volume analysis. However, I haven’t seen a tool to test this approach, so I’m currently writing a simple prototype.
  2. Create netflows for traffic. Utilize the previous statistical metrics, but combine this with network traffic relationships. To do this, implement directed graphs. For instance, you might detect a compromised machine via the former method, but know you need to determine what machines it attacked — not just communicated with. Often, the attacking machine will initiate a probe or (blindly) initiate an attack, but in either case initiate the connection. As discussed above, I generally don’t use a traditional shell or reverse shell. I want a known good service to callback. So a second connection is often initiated from the attacked machine to the attacker. From this point, the traffic volume statistics should also apply. The attacker will datamine or pivot through the attacked machine causing unusual (for http) traffic patterns. The direct graph can be used to reduce the traffic for analysis into subsets where more precise statistics can be computed.

These methods are far from perfect. In fact, I wouldn’t even say they’re good, but then again there are few good methods I’ve heard for tackling this type of problem. The only one that comes to mind is a statistical analysis on traffic times. For instance, looking for regular call-back intervals. I find this difficult, as many implants will call back every couple of days or at best every several hours. This is a difficult challenge to program as the memory requirements to log every packet time, source ip, dest ip, and size for any length of time on a fairly large network are enormous. Additionally, the mitigation is fairly trivial — add a random delay of up to several hours.

In the next week or two, I hope to release prototype code (netent) that implements the statistical traffic analysis. Currently it’s working fairly well, but I need to tune it better, determine how to display data, and clean up the code. It currently (successfully) identifies gmail and facebook as anomalous. This makes sense as both are highly chatty Web 2.0 applications that send a lot data compared to other sites — where you typically send very little and read a good deal. Therefore, the traffic is anomalous in a manner similar to compromised hosts. The challenge going forward is reducing benign anomalous traffic while still capturing possibly malicious anomalies.

Security Taken Aback

September 11th, 2009

hacking-tngI was surprised a week or two ago when Hacking: The Next Generation showed up on Amazon. I generally anxiously await most new security titles months ahead of time. But this one stayed under the radar. Hacking immediately got my attention both because of the title and because the lead author, Nitesh Dhanjani, wrote Network Security Tools. The latter book is an excellent work with practical examples and lots of code that will enable you to quickly jump into the internals of many security tools.

Hacking sets ambitious goals. The premise seems to be evolving the art of penetration testing to include recent blended attacks. If you’re familiar with HD Moore’s Tactical Exploitation or other similar talks you’ll see a lot of familiar material. Additionally, the book covers ‘blended’ or ‘chained’ attacks. Overall, the book succeeds in articulating conceptual attack vectors. Unfortunately, it falls short in execution. The book lacks details. Excluding Chapter 2, there’s very little code. So while the concepts are great, the book doesn’t enable the reader to readily execute the discussed attacks.

Several of the chapters have extremely interesting topics. Chapter 6: Abusing Mobile Devices could have been awesome. But rather than discussing mobile software security, it focuses on web sites of mobile provider websites and theoretical security weaknesses. There’s very little sustenance that the reader can use to hack or evaluate a particular mobile device.

Chapter 7: Infiltrating the PhishingUnderground is similar to the previous chapter. It looks at real-world attacks, but it doesn’t give a security professional the tools to go execute phishing attacks. The chapter belongs more in the book Crimeware.

Overall, I’m disappointed in the book. If you’re new to penetration testing or the security community this might be a good book. But if you’re looking for something that can immediately be hands on to execute the latest types of attacks, Hacking falls short. I think the cover accurately describes the book… it’s a cool looking pirate ship. However, if you look closely, you’ll see that the ship is taken-aback (meaning it’s sailing too close to the wind and the captain has lost control), and is in danger of being dismasted. Likewise, the book looks cool, and could be cool, but something went wrong.

Note: the book isn’t available in paperback yet. It is available on safari.oreilly.com and on the Kindle.

Even Security Engineers Make Mistakes

September 10th, 2009

I’ve been writing a basic packet capture utility to implement a couple ideas I’ve had floating around on statistical IDS. Earlier today I had an odd bug. I had a pointer to a vector that after “a while” would randomly be de-referenced. I thought maybe libpcap was causing a concurrency issue and the structure was accidentally de-referenced then. After making some alterations I decided that was unlikely.

Here’s an example piece of what I did:


vector <omg *> * foo()
{
   vector <omg *> v;
   while(data)
    {
      //manipulate data
      v.push_back(data);
     }
return &v;
}

 int main()
 {
   vector * vptr;
   while(things_to_do)
   {
      vptr = foo();
      //process vptr
   }
}

Don’t nitpick pseudo code for memory management. The problem is actually fairly obvious if you think about it. v formatted in foo() is a stack variable. So it works fine inside the function. It also works fine later, for a while, but then that stack space is re-used and your pointer is dereferenced. As a result you get a bug that will eventually appear no matter how many sanity checks you perform later. The right way to do this is as follows:

vector <omg *> * foo()
{
   vector <omg *> * v = new vector <omg *>;
   if(!v)
      return NULL;

   while(data)
    {
      //manipulate data
      v-%gt;push_back(data);
    }

return v;
}

This works, because new creates a heap allocation that’s only destroyed with an explicit free/delete. So the repaired code works fine without randomly being de-allocated. The only trick from

A Growing Trend

August 31st, 2009

This morning the Apache website was compromised. They gave a quick update on the situation in the following post:

On August 27th, starting at about 18:00 UTC an account used for automated backups for the ApacheCon website hosted on a 3rd party hosting provider was used to upload files to minotaur.apache.org.  The account was accessed using SSH key authentication from this host.

<snip>

The attackers created several files in the directory containing files for www.apache.org, including several CGI scripts.  These files were then rsynced to our production webservers by automated processes.  At about 07:00 on August 28 2009 the attackers accessed these CGI scripts over HTTP, which spawned processes on our production web services.

At about 07:45 UTC we noticed these rogue processes on eos.apache.org, the Solaris 10 machine that normally serves our websites.

Within the next 10 minutes we decided to shutdown all machines involved as a precaution.

This type of attack is getting more press recently. Website for several members of the US House were compromised recently as reported here in the Washington Post. Additionally, many security professionals were made keenly aware of the risks of shared hosting.

These attacks are interesting for several reasons. The most obvious is that they are very high profile targets. Hacking important sites gets attention. Second, it’s a common assumption that 3rd party hosting is safe (or at least mitigates risk by moving it to a 3rd party). Most security professionals accept that all websites will be eventually compromised so move them elsewhere. The problem is that most organizations cannot maintain the necessary discipline to keep everything important off their 3rd party hosted sites. When you consider that constituents and clients require access to various documents and web apps, it’s a considerable problem. Additionally as web2.0 becomes more popular and the even more invasive “cloud computing” becomes a standard business process, organizations will face further security problems. Thus while some risk is shifted by 3rd party hosting, some isn’t and security complexities increase.

The reason that this subject is particularly interesting to me is that 3rd party hosts are almost always excluded from penetration tests. Most organizations have to exclude 3rd party hosts as they’re generally shared and acquiring permission is virtually impossible. That of course leads to numerous problems.

I’d make a few suggestions:

  1. Push assessment of 3rd party hosting as far as possible. The line between ethical and out of bounds is blurry, but it’s a line pen testers are familiar with. If you’re working with a reputable firm that you trust, let them know they can investigate your corporate resources hosted on 3rd parties (careful tho that you can only authorize assessing your resources)
  2. In every network (external/internal) pen test, include web apps as in scope. This shouldn’t replace regular, more comprehensive web app assessments, but including web apps in the RoE can enable an attacker to leverage realistic attack vectors — including third party hosted web apps.
  3. Have the pen testers go ahead and buy an account on the hosting provider. Generally web hosting is cheap. For less than $50, they can get an account with the same provider and determine what risks might be of interest.

I’m obviously not a lawyer, so please don’t take my word for what is legal or contractually acceptable. But it’s the job of pen testers to realistically portray the threat of real-world attackers. To do this we need to adapt techniques we see in use.

Incident Response

August 26th, 2009

Recently, I’ve been thinking about the role of incident response. This isn’t typically an area that I directly deal with, but is often related to most of my work. I’ve spent untold hours writing network sensor software, and even more trying to avoid detection during pen tests. I’ve even been the handler on a few incidents, but it’s not a direct activity in my daily work.

That being said, I’m surprised with the documentation that’s available on the topic. Most books are at least 5 years old. Various websites have quick briefs (much like this one) and there are a few government funded guides there are at least 100 pages. I’m writing this post as general guidance and to link some of the better information I found.

First, as every guide states, have a plan! This is tremendously important. An incident response without a plan is likely to result in massive problems. Why is the plan so important? A smooth response is the easy answer. The more accurate is that when an incident occurs decision makers are going to be afraid. They’ll make poor decisions or want to meet with several executives. This is simply not practical during a time sensitive event.

Many guides will detail the various hardware, skill sets, and other random things you might need, etc. I’m not going get into that. There are a few basics that need to be established at the policy level and then the team should implement a detailed plan. The high level requirements are:

  1. Guidance on declaring an incident
  2. How to appoint an Incident Manager (IM)
  3. Authorities and budget of the IM
  4. The business objectives  the IM

Officially declaring an incident seems like an overly formal event. But business processes are going to change, hours are going to be burned, and someone needs to get in control. Therefore there needs to be an official process for this happening. This process can simply be an email decision by the CIO/CISO/CEO/whoever but it needs to happen. It also needs to happen in time. Responding to an incident after sys admins have started making furious changes is only going to complicate the situation.

Appointing the Incident Manager is key. Various guides will call this person the Incident Handler. I believe IM is a better term as this person’s key responsibility is coordinating technical staff, executives, and communication. This person needs to be solid under pressure, senior enough to lead a team and to communicate with executives. Technical abilities should be significant enough to understand the technology and attacks, but this person doesn’t need to reverse a themidia packed custom binary. The IM should immediately select a technical lead/deputy that can handle the technical details. While the IM is meeting with the CIO someone needs to make sure the technical work is getting done!

As I said, authorities change during an incident. The most basic change is that all IT decisions now go through the IM. An incident cannot be handled if forensic staff are trying to image a drive that sys admins are trying to repair, or if web developers are fixing the SQL injection on the production host. I’ve seen both. Fighting the technical details such as these or arguing with an IT/operations manager just wastes time. The other common problem, is the fight with change management. High availability organizations likely have a week or longer review and testing process for change management. If the executives want faster change, the IM needs authority to do so and the business needs to understand the associated risks.

Everything needs to be tied back to business goals. Contrary to what geeks think, myself included, the organization does not exist for the network. There are business needs that need to be foremost in mind. Some businesses will want to fully investigate an incident, press charges, sue, etc. Others will have complex legal reporting requirements. Most will simply want the incident fixed as quickly as possible. Knowing what the organization wants and when is key. The job of the IM is take the current situation and move it best available business objective, which implies the business objectives are known.

More detailed planning should define the standard operating procedures for how different types of incidents should be handled. These should begin with high level business objectives and classes of attacks. In my opinion these are important but less so. The particulars of an incident tend to make sticking to a detailed plan difficult. Keeping these guides flexible is important. I find it best to notate the tasks that need to be accomplished and important checkpoints required to ensure every one is on the same page and that stake holders are informed. The guides should also keep important contact information centralized.

The exception to flexible planning is the execution of common tasks. These common tasks occur as part of large incidents or as the whole response to low level incidents, meaning a less formal more common incident requiring a response but not full escalation. Every team should be able to rapidly:

  1. Block an external IP address at the gateway
  2. Monitor/log all traffic for a particular IP addresses anywhere in the network for analysis
  3. Redirect an internal or external host to an untrusted subnet/sandbox
  4. Block an internal hosts traffic by IP, MAC Address, and wall jack
  5. Deploy a new IDS/IPS signature
  6. Deploy patches/software to a server or workstation

The six above items are core tasks any incident response team should be able to execute in minutes. Obviously one wouldn’t want to decide to push software out in a few minutes, but once the decision is made software should begin moving out as quickly as possible.

Any organization building the basic policy and operating procedure guides is well on its way to effectively responding to incidents. The next step is to develop an efficient process to quickly execute the common six activities above. The two elephants in the room are: detecting the incident and forensics/reverse-engineering. These items require extreme specialization. Detection is about monitoring your network and hosts, establishing a baseline, and observing malicious traffic. In my experience breaking into systems this is an incredibility hard task. As bad as these products can be, the only real answer for an enterprise is to use them, love them, and tweak them. Reverse-engineering is easier — just go find a guy that owns a license to IDA Pro and launches into a rant when comparing Windbg with Olly/Imdbg.

If you look at recent posts, I’m sure you can figure out where I stand on the last item.

Useful Links:

  • http://csrc.nist.gov/publications/nistpubs/800-61/sp800-61.pdf
  • http://www.cert.org/csirts/Creating-A-CSIRT.html
  • http://en.wikipedia.org/wiki/Computer_security_incident_management
  • http://www.sei.cmu.edu/pub/documents/03.reports/pdf/03hb002.pdf

Custom Malware

August 25th, 2009

Here’s a link over to Adam’s site where he links a quick report we did on some malware. The malware itself isn’t especially interesting. Conficker definitely blows it out of the water. What is interesting is that it appears to be a custom implant used to target a particular financial organization. This isn’t a bot that happened to get into a network, this is someone that wrote software and is carefully maintaining access to the network.