Covert communication allows two or more parties to communicate without being detected by other parties. Steganography is a means by which entities communicate covertly by hiding information within some media. In the ideal case, the data are hidden so well that only the intended receiver can decipher the message and also that the no change in the media is observed. Traditionally, steganography mechanisms have been implemented in digital media such as image, video, and audio files. A recent trend that is arising in security research is network steganography, where information is hidden within timing channels and storage channels.
In this paper, the network blending communication system (NBCS) is described. This system, provides a novel approach aimed at protecting against attackers by hiding node identities and communication. NBCS allows secure communication to occur between covert nodes in internal networks by hiding messages in the payload fields of multiple network streams of overt nodes.
Network streams are connections between nodes that update above some threshold rate. A connection consists of the one-way packets between two nodes. In addition, these packets also use same protocol and have equal length. The NBCS treats streams in a network analogously to a video stream. A snapshot or window of the network traffic is like an image of a video stream. Connections are like pixels in an image. Like digital steganography, NBCS attempts to hide information in locations that will have the least noticeable effect in the original media. The NBCS is configurable and automatically chooses locations to blend covert data. In addition, communicators need not reveal identities to communicate.
In the field of computer security, covert communication is usually seen as adversarial, but from another perspective, it can be seen as a way to communicate securely by hiding data from a malicious third party, e.g., an inside attacker. In this light, instead of making data unreadable using encryption, it may be possible to hide from an adversary a secure network infrastructure (consisting of several node endpoints) in network traffic.
Current covert communication techniques, using storage and timing channels, are not suited well for this task. Storage channels typically use properties of a protocol that are ignored, such as unused header fields. In this case, once the vulnerability of the protocol is documented, an attacker may uncover the data and breach the communication. Timing channels work by purposely modifying timing mechanisms on a network such as packet arrival times. In general, timing channels are difficult to detect, but they provide low throughput. In this paper we describe a novel blending technique that is capable of using as carriers the payload fields of multiple connections including audio, video, and voice over IP (VoIP) streams.
To send covert data, the technique executes in three main phases. In the analysis phase the covert sender will analyze traffic in promiscuous mode. In the selection phase, the sender selects locations to place covert data. In order to blend with active network traffic, the sender will select connections with high data rates and a sufficient amount of randomness in the payload. Within these connections, the packets with the highest randomness (considered injection points) are duplicated and slightly modified to include the covert data. Finally, in the sending phase, the modified packets are injected into the network (still containing the original source and destination addresses). By analyzing the same traffic the covert receiver will identify the injection points and extract the covert messages. We implemented the blending covert method (BCM) tool and evaluate it using user datagram protocol (UDP) connections during two network loads. Our results show that our technique works with limited data loss and we analyze the tradeoffs between throughput and detectability.
A large amount of research is focused on identifying malware. Once identified, the behavior of the malware must be analyzed to determine its effects on a system. This can be done by tracing through a malware binary using a disassembler or logging its dynamic behavior using a sandbox (virtual machines that execute a binary and log all dynamic events such as network, registry, and file manipulations). However, even with these tools, analyzing malware behavior is very time consuming for an analyst. In order to alleviate this, recent work has identified methods to categorize malware into “clusters” or types based on common dynamic behavior. This allows a human analyst to look at only a fraction of malware instances–those most dissimilar. Still missing are techniques that identify similar behaviors among malware of different types. Also missing is a way to automatically identify differences among same-type malware instances to determine whether the differences are benign or are the key malicious behavior.
The research presented here shows that a wide collection of malware instances have common dynamic behavior regardless of their type. This is a first step toward enabling an analyst to more efficiently identify malware instances’ effects on systems by reducing the need for redundant analysis and allowing filtration of common benign behavior.
This research uses the publicly available Reference Data Set that was collected over a period of three years. Malware instances were identified and assigned a type by six anti-malware scanners. The dataset consists of dynamic trace events of 3131 malware instances generated by CWSandbox.
For this research, the dataset is separated into two sets: small and large. The small set contains 2071 instances of malware that are less than 100 KB in size. The large set contains 1060 instances of malware that are between 100 KB and 3.4 MB in size.
In order to measure the common behavior between the small and large sets, common sequential event sequences within each malware instance in the small set are identified using a modified version of the longest common substring algorithm. Once identified, all appearances of these common event sequences are removed from the large set to determine shared behavior. Most common sequences are between length 2 and 60 events. Results indicate that when using length 2 event sequences and higher, on average, the large set instances share 96% of event sequences, with length 6 and higher event sequences–66%, and with length 12 and higher event sequences–50%. This indicates that an analyst’s workload can be largely reduced by removing common behavior sequences. Furthermore, it shows that malware instances may not always fall into exclusive categories. It may be more beneficial to instead identify behaviors and map them to malware instances, for example, as with the Malware Attribute Enumeration and Characterization (MAEC).
Future efforts may look into attaching semantic labels on long sequences that are common to many malware instances in order to aid the analyst further.
People in dialog use a rich set of nonverbal behaviors, including variations in the prosody of
their utterances. Such behaviors, often emotion-related, call for appropriate responses, but
today’s spoken dialog systems lack the ability to do this. Recent work has shown how to
recognize user emotions from prosody and how to express system-side emotions with prosody,
but demonstrations of how to combine these functions to improve the user experience have
been lacking. Working with a corpus of conversations with students about graduate school,
we analyzed the emotional states of the interlocutors, utterance by utterance, using three
dimensions: activation, evaluation, and power. We found that the emotional coloring of the
speaker’s utterance could be largely predicted from the emotion shown by her interlocutor
in the immediately previous utterance. This finding enabled us to build Gracie, the first
spoken dialog system that recognizes a user’s emotional state from his or her speech and
gives a response with appropriate emotional coloring. Evaluation with 36 subjects showed
that they felt significantly more rapport with Gracie than with either of two controls. This
shows that dialog systems can tap into this important level of interpersonal interaction using
Although spoken dialog systems are becoming more widespread, their application is today
limited largely to domains involving simple information exchange. To enable future applications,
such as persuasion, new capabilities are needed. One barrier to the creation of
such applications has been the lack of methods for building rapport between spoken dialog
systems and human users, and more generally the inability to model the emotional and
interpersonal aspects of dialog. This dissertation focuses on improving this.
A corpus of persuasive dialogs that in which a graduate coordinator informed undergraduate
students about the graduate school option was analyzed. Although much of each
dialog was involved in conveying factual information, there was also a heavy use of what
appear to be rapport-building strategies. This seemed to occur through emotional coloring
of the utterances of both coordinator and students as heard in prosodic variation, including
variation in pitch, timing, and volume.
Some of these rapport-building strategies were modeled and implemented in a spoken
dialog system named Gracie (Graduate Coordinator with Immediate-Response Emotions).
Gracie is the first dialog system that uses emotion in voice to build rapport with users.
This is accomplished by first detecting emotions from the user's voice, not classic emotions
such as sadness, anger, and joy, but the more subtle emotions that are more common
in spontaneous conversations. These subtle emotions are described with a dimensional
approach, using the three dimensions of activation (active/passive), evaluation (positive,
negative), and power (dominant/submissive). Once the user's emotional state is recognized,
Gracie chooses an appropriate emotional coloring for the response.
To test the value of such emotional responsiveness, an experiment with 36 subjects
examined whether a spoken dialog system that recognizes human emotion and reacts with
appropriate emotion can help gain rapport with humans. Users felt significantly more
rapport with Gracie to the controls, and in addition, users significantly preferred Gracie
to the other two systems. This suggests that dialog systems that attempt to connect to
users should vary their emotional coloring, as expressed through prosody, in response to
the user's inferred emotional state.
When people speak to each other, they share a rich set of nonverbal
behaviors such as varying prosody in voice. These behaviors,
sometimes interpreted as demonstrations of emotions,
call for appropriate responses, but today’s spoken dialog systems
lack the ability to do so. We collected a corpus of persuasive
dialogs, specifically conversations about graduate school
between a staff member and students, and had judges label all
utterances with triples indicating the perceived emotions, using
the three dimensions: activation, evaluation, and power. We
found immediate response patterns, in which the staff member
colored her utterances in response to the emotion shown by the
student in the immediately previous utterance, and built a predictive
model suitable for use in a dialog system to persuasively
discuss graduate school with students.