|
Unfortunately,
detecting isn?t an easy task. There are two keys ways of detecting
spam, one is by looking at the host or person sending it and the other
is by the examining the content of the message.
The only way to truly
know where an incoming e-mail is coming from is via the IP address of
the sending SMTP server. Everything else can be faked, spoofed, by a
spamer. So if you want to filter out spam being sent by a certain host
you need to block the sending SMTP server?s IP address. To make it even
more difficult, many spamer use multiple IP address to send different
e-mails out. So if you just block one IP address of a spamer you might
get future e-mails from them if they send mail from another IP.
Looking at the
content of a message to determine if it is spam is never 100% accurate.
While you can search a message on keywords like ?free adult site? it is
possible that valid e-mail might contain almost any keywords or phase
you come up with. For example ?There is a new free adult site for
previewing R rated moved now on blockbuster.com? might be part of a
valid message. So when we talk about using keywords to look for spam
you need to keep in mind that using them might filter out valid e-mails.
Getting
properties from the message
In the above code sample we set the
messagestatus
property in the
EnvelopeFields
namespace but we now need to get at some additional properties of the
message to see if it is spam or not. The first one we need to get is
the IP address of the server sending the message. This property is
stored in the message header. In the line below we set the
MsgReceivedFrom
variable to the
received
field in the message header.
MsgReceivedFrom =
objMsg.Fields.Item("urn:schemas:mailheader:received").Value
Now that we have the information on where
the message was received from we need to extract the IP address of the
sending server from it. Since the received from field has a lot of
other data in it besides the IP address we are looking for, see the
example below, we need to extract out the data we need.
from mkt-mail.ebates.com ([63.236.56.147]
RDNS failed) by mail.corp.com with Microsoft SMTPSVC(5.0.2195.5329);
Tue, 19 Nov 2002 17:46:20 -0600
So we need to get the ?63.236.56.147?
from this field. Using the code below we use the
Mid function to get the data
between the
[
and ]
characters. The reason we put in the
If
statement is to make sure that the
Mid
function does not return an error, the ending position must be greater
then the starting position or it will return an error. So after running
code below the
SendingHostIP
variable should now be set to
63.236.56.147.
StartIP = InStr(MsgReceivedFrom,"[")+1
EndIP = InStr(MsgReceivedFrom,"]")
If EndIP-StartIP > 0 Then
SendingHostIP =
Mid(MsgReceivedFrom,StartIP,EndIP-StartIP)
End If
Check
to see if a host is a spammer
Now that we have the
IP address of the sending SMTP server we need check see if the sender is
a known spamer. There are a few ways that this can be done but one
thing to keep in mind is that all incoming SMTP mail is stopped until
the script releases the message since the SMTP service only allows one
event sink to run at a time. So the script needs to run very fast,
which reduces the time you can spend checking the host. With this in
mind I decided to take advantage of the Active Directory and store spam
senders in the AD as contacts. There are a few benefits of this
approach. First you can goto the OU where they are stored and see a
list of the spam senders and easily add addition data to the contacts.
Second, it is very easy add new contacts\spam host to the AD since you
just need to create a contact with the IP address as the name of the
contact. In addition, since the AD support delegation you can delegate
out to someone the ability to manage the spam hosts without giving them
admin access to Exchange. So these benefits will make managing your
spam sites very easy. Finally, and possibly most import for large
organizations, the time it takes to bind to an object in the AD is very
small. This means the check should be very quick and the script should
be able to handle a large flow of SMTP mail without slowing down
delivery of messages.
Most commercial
packages use one or more of the public blacklisted open relay or spamer
sites to check to see if the sending IP address should be blocked. For
speed purposes they normally cache part of this data. The included
script includes support to query these blacklist servers, but this code
isn?t covered in this article. I will cover the code in Part 2 of
Managing Spam.
Below we are setting the
SpamContactPath
variable to the path of the possible contact in the AD and then we
attempt to bind to it, we also tell the script to continue if the
GetObject returns and error.
SpamContactPath = "LDAP://SRVHOUDC01/cn=" &
SendingHostIP & "ou=Spam Sites,ou=Admin,dc=corp,dc=com?
On Error Resume Next
Set objSpamContact = GetObject (SpamContactPath)
Now that we attempted to bind to the
object in the AD we need to check the error returned, if any, and block
the message if the spam host was found. So below we are checking for
the error number 8007230, after converting it to a hex and string
value. So if the error is 8007230, ?There is no such object on the
server?, then we will assume the sender is not a spamer at this point
and set the
SPAM variable to False. If we are able to
bind to the object, which means it does exist in the AD, we then set the
SPAM
variable to True.
If CStr(Hex(Err.Number)) = "80072030" Then
SPAM = False
ElseIf Err.Number = 0 Then
SPAM = True
End If
Searching for keywords in the message
Since you won?t have a list of all know
spam host, nor will anyone since new ones come up daily, you will need
to filter by something else. So now we turn to content filtering by
looking for keywords in the message data. Since content filtering can
easily block valid message, causing false positives, you need to make
sure that your keywords aren?t too generic. First you should create a
sub routine that can check for key words when passed a string of text
and a list of keywords.
In the one below
Data
is the string that is being checked,
Words
is list of keywords where each keyword or
phase is separated by a
^,
and SPAM
is the variable we are using to flag the message as spam or not. The
first thing we do is make sure the string being passed to the Sub is
valid and the message hasn?t already been flagged as spam. Then we need
to take the word list and break it into individual words or phrases and
store them in an array. This is done by using the
Split function. Last we need
to loop though each value in the array using a
For loop and stops at the last
value, which is retrieved using the
UBound function, of the array
or stop if a keyword is found.
Inside the loop the value of
Data
is check each time to see if it matches the current keyword. This is
done by using the
InStr function, note since the
InStr
function is case sensitive we convert the
Data
variable and current keyword to lower case with the
LCase function. If the keyword
does exist in
Data
then we set
SPAM
to True and exit the
For
statement.
Sub CheckWords (Data,Words,SPAM)
If Trim(Data) = "" or SPAM Then Exit Sub
End If
Dim WordArray, i
WordArray = Split (Words,"^",-1,1)
For i = 0 to UBound(WordArray)
If InStr(LCase(Data),LCase(WordArray(i)))
Then
SPAM = True
Exit For
End If
Next
End Sub
Before we can call our
CheckWord
sub we need to have the data we want to check and a list of key words.
So at the beginning of the script you should set a constant,
Const, that contains the key
words you want to check for separated by ^, caret, character. Then in
the main Sub,
ISMTPOnArrival_OnArrival, you can
call the
CheckWords Sub with the field you
want to check,
MsgFrom
in this case below, and keyword list. If you plan to check the body of
the message you will probably want to have another set of keywords that
are more limited since there are some cases where the words below maybe
valid in a body of the message.
Const KeyWords = "porn^Viagra^Horny
Amateur^Refinance Now^Adult ^xxx^Ca$h^winder^erotic^ sex^casino"
MsgFrom = objMsg.From
CheckWords MsgFrom,KeyWords,SPAM
Getting
the Class C address of a spam host
Since many spamer send messages out from
multiple servers filtering out just one of their IP address will not
block all the spam from them. In most cases these spamer own a class C,
255 IPs, block of IP addresses. So it makes sense to filter out the
entire class C address of spam host. This also can be used to filter
out spam being sent from dynamic address, like those given out over a
cable or DSL line, which spamer some times send mail from also. The
function below is used to return only the first three parts of an IP
address. This allows us to use the
SendingHostClassC
variable to check for a contact in the AD.
SendingHostClassC = GetClassC(SendingHostIP)
& ".0"
Function GetClassC (Data)
Dim IPs
IPs = Split (Data,".",-1,1)
If UBound(IPs) <> 3 Then
GetClassC = "Invalid"
Exit Function
End If
GetClassC = IPs(0) & "." & IPs(1) & "." &
IPs(2)
End Function
Changing the message status
Now that we have a couple of checks in
place we now need to set the message status so Exchange won?t deliver
mail if find it is spam. In the code below we are checking to see if
the message has been marked as spam. If
SPAM is
False
we set the EventStatus
variable to 0 (cdoRunNextSink),
the default of messagestatus
is 0 (cdoStatSuccess)
so we don?t need to change it. If the message has been flagged as spam
we then need to update the
messagestatus property to tell Exchange not
to deliver it by setting this property to 3 (cdoStatBadMail).
If Not SPAM Then
EventStatus = 0
Else
objfields.Item("http://schemas.microsoft.com/cdo/SMTPenvelope/messagestatus").Value
= 3
objfields.Update
EventStatus = 1
End If
We could
alternatively set the message status to 2 (cdoStatAbortDelivery)
to have the message deleted instead of saving it in the
BadMail
directory. I would recommend saving the message to the
BadMail
directory because this lets allows for easy retrieval of a message and
forwarding of it if it turns out it was a valid message. To forward a
message on if it was flagged as spam incorrectly you just need to drag
the message, which will end in
.BAD,
from the
BadMail directory to the
PickUp
directory. Exchange will process any items in this directory and if
they are formatted correct will deliver them. In the final version of
the script it includes support to bypass filter on messages sent by this
method. Before you drop the message into the
PickUp
directory you should open it in notepad and remove any To: address that
aren?t local to your mail system, otherwise Exchange will try to deliver
the message to them also, this prevents users outside of your mail
system from get the message twice.
Putting
it all together
Below is a basic example of a fully
working spam filtering script. The
CheckWords
Sub and
GetClassC Function covered above
are not included to save space; they would need to be in the same
.vbs
file before this script will work.
Const KeyWords = "porn^Viagra^Mortgage
Rates^Refinance Now^Adult ^xxx^Ca$h^winder^erotic^ sex^casino"
Sub ISMTPOnArrival_OnArrival(ByVal objMsg,
EventStatus )
On Error Resume Next
? If the script fails anywhere we should assume the message is not spam
SPAM = False
MsgReceivedFrom =
objMsg.Fields.Item("urn:schemas:mailheader:received").Value
StartIP = InStr(MsgReceivedFrom,"[")+1
EndIP = InStr(MsgReceivedFrom,"]")
If EndIP-StartIP > 0 Then
SendingHostIP =
Mid(MsgReceivedFrom,StartIP,EndIP-StartIP)
End If
SendingHostClassC = GetClassC(SendingHostIP)
& ".0"
SpamContactPath = "LDAP://SRVHOUDC01/cn=" &
SendingHostClassC & "ou=Spam Sites,ou=Admin,dc=corp,dc=com"
? Try to bind to an contact
Set objSpamContact = GetObject (SpamContactPath)
MsgFrom = objMsg.From
CheckWords MsgFrom,KeyWords,SPAM
?Check the From field for any of the keywords
If Not SPAM Then
?Message hasn?t be marked as
spam so continue
delivery
EventStatus =
0
Else
? Change the message status so it
isn?t delivered
objfields.Item("http://schemas.microsoft.com/cdo/SMTPenvelope/messagestatus").Value = 3
objfields.Update
?Prevent running any other event syncs
EventStatus = 1
End If
End Sub
|