STORM (RT) - Real Time Response to Email Load
Opus One's RT product works with Innosoft's PMDF and Sun's SIMS software to provide a real-time email storm and bomb avoidance facility. The goal of RT is to slow the flow of "bad" mail through the email backbone while letting "good" mail flow into the backbone at the fastest rate possible.
RT detects and reacts to storms and bombs by "pushing back" on the sending email system. This puts the burden of dealing with the bomb on the sending email system manager, rather than on the RT system manager.
An email "storm" is defined as multiple email messages, sent from a single source, over a very short period of time, into the backbone. A typical storm involves a single user on the Internet sending many copies of a single message to a single recipient.
An email "bomb" is defined as a very large email message, typically in the multi-megabyte range.
1.2 A Note on Capabilities
The concepts of "sender" and "recipient" in Internet email are technically quite vague. When a person sends an email message to another person, at least three levels of "sender" are involved:
Because RT has a goal of dealing with very large volumes of mail very efficiently, it operates at the lowest level of "sender" and "receiver:" the physical system which has sent the mail. This permits not only an order of magnitude increase in performance but also helps to guard against malicious users who vary their "From:" RFC-822 header field (typically to forge email).
However, RT is also aware of the RFC-821 Envelope FROM and TO addresses, since these are available to it with a very low performance cost. RT uses the physical sender, the Envelope FROM and the Envelope TO to make most of its decisions.
An adjunct to RT is the Filter Channel Plus. This can do additional kinds of filtering and mail handling. When both Filter Channel Plus and RT are installed on the same system, they can work together synergistically. If activated, Filter Channel Plus can feed information to RT, giving it additional information about the kinds of email which are being seen and giving it the opportunity to identify other "bad" email senders.
RT uses two main executable images and several "helper" programs. It is important that RT be properly installed, as a half-installation could leave the SMTP server unable to receive any email or could slow performance dramatically.
RT is a distributed system, consisting of a special hook into the email system's SMTP server software and a server which receives queries from the SMTP server, responds to them, and maintains history information on good and bad senders.
The part of RT which connects directly to the SMTP server is called "RT_VFY," (said "RT verify") and the server is called "RT_LOG." These do not have to run on the same system, or even on the same computing architecture or operating system.
When you add the RT mappings to your PMDF or SIMS system, you will need to encode the IP address of the RT_LOG server in them. You may not use a DNS name (e.g., rt_log.opus1.com); you must use an IP address (e.g., 126.96.36.199).
For performance and efficiency reasons, no DNS is available.
You must be using the PMDF or SIMS multhreaded SMTP server. Most PMDF and SIMS sites are already using this server; only old PMDF sites running on OpenVMS would ever use the old SMTP client/server. If you are an old-time PMDF site on OpenVMS, you should see Chapter 16.1 of the PMDF System Manager's Guide (version 5.0) for instructions on setting this up. It is preferable to install and begin using the multithreaded TCP SMTP server before installing RT. The multithreaded SMTP server increases performance of any mail system which has a moderately heavy load and sufficient extra memory.
RT itself does not put much of an additional load on the email backbone. However, sites which install RT typically anticipate having a heavy load to begin with. When installing RT, it is advisable to check all system tuning indicators to ensure that there are no bottlenecks imposed by the operating system.
2.1.1 OpenVMS Tuning Prerequisites
Make sure that the username used for PMDF operations (normally SYSTEM) has sufficient BYTLM (at least 100,000), WSEXTENT (at least 50,000), PGFLQUOTA (at least 100,000), BIOLM (at least 150), and FILLM (at least 200). You may also want to verify SYSGEN parameters which could affect this, including WSMAX, CHANNELCNT, and VIRTUALPAGECNT. (OpenVMS)
2.1.2 Unix Tuning Prerequisites
Generally, RT does not put much additional load on a system which is already running a heavy mail load. However, if you are installing RT in a new mail environment, you are strongly advised to get your PMDF or SIMS configuration up and running under load before attempting to add RT. Once the system is operational, normal Unix system tuning procedures should be used to ensure adequate disk and memory are available.
2.2 Images and System Startup
Put all of the RT files in a single directory easily accessible to the PMDF or SIMS software. Normally, this is PMDF_ROOT:[RT] (OpenVMS), /pmdf/rt (PMDF), /sims/rt (SIMS). If you choose another directory, you will have to hand-edit the command procedures and shell scripts to reflect this.
During system startup, the RT_LOG process must be started. On OpenVMS, the command procedure START_RT.COM should be called to install RT images and start the RT_LOG process. On Unix operating systems, the start_rt.sh script should be called from the appropriate run level to start the RT_LOG daemon. This should be started before any mail SMTP servers are running.
On OpenVMS, RT uses an installed image, pointed to by the logical name RT_VFY, to link to the multithreaded SMTP server. RT_VFY talks to RT_LOG using UDP in real time (hence the name, RT) to determine whether or not to accept an incoming connection.
RT_LOG must be running at all times when RT is in place. It is started by START_RT.COM (OpenVMS) or start_rt.sh (Unix).
RT_BLAB (on Unix: rt_blab) is a helper program which talks to RT_LOG and can download information about the state of the email system. Typically, you will run RT_BLAB automatically every ten or twenty minutes. If a threshold level of suspect mail connections is reached, then RT_BLAB will send email to users on a list for action or information. RT_BLAB is purely optional. If you want to use RT_BLAB, the RT_BLAB.COM (OpenVMS) command procedure gives an example of how to run it safely and what to do with the results. The START_RT command procedures/shell scripts do not start RT_BLAB.COM. Inside of RT_BLAB's driver command procedure, the logical name RT_BLAB_USERNAME is used to list those individuals who should receive notification bulletins from RT_BLAB about the state of the RT software. Because of the way that RT_BLAB works, it is only supported on Unix. However, similar functionality to RT_BLAB can easily be created with a short PERL script and shell script.
RT_CONTROL is a helper program which is used to tell RT_LOG to reload its operating parameters, change log file versions, to dump data structures out, to restart/exit, and to display the log of recent denials. You will probably want to have RT_CONTROL available so that you can tell RT_LOG to reload its configuration files. It is preferable to reload RT_LOG rather than to restart RT_LOG, because the information about recent email is only stored in memory. If you restart RT_LOG, you get a clean slate---all information which was in the system is lost.
2.3 Mapping File
The standard mail system MAPPINGS. file (normally located in PMDF_TABLE but actually pointed to by the logical name PMDF_MAPPING_FILE) needs to be modified to cause the multithreaded SMTP server to consult RT when an incoming connection occurs. On Unix systems, the file is known as "mappings" and is normally in /pmdf/table (but actually pointed to by the entry in the /etc/pmdf_tailor file for PMDF_MAPPING_FILE).
For most OpenVMS sites, this means adding the following lines:
PORT_ACCESS * $[RT_VFY,RT_VFY_QUERY,127.0.0.1|PORT|$0] * $Y MAIL_ACCESS * $[RT_VFY,RT_VFY_QUERY,127.0.0.1|MAIL|$0] * $Y
For most Unix sites, this means adding the following lines:
PORT_ACCESS * $[/pmdf/lib/rt_vfy.so,RT_VFY_QUERY,127.0.0.1|PORT|$0] * $Y
MAIL_ACCESS * $[/pmdf/lib/rt_vfy.so,RT_VFY_QUERY,127.0.0.1|MAIL|$0] * $Y
Unix sites will probably want to include a full path name for the rt_vfy.so image. PMDF software uses the dlopen() routine to find this image, so the defaults for directory and extension will vary from platform to platform. However, for fastest response time, putting in the full path name is preferred.
You should replace "127.0.0.1" with the IP address of your RT_LOG server. If the server is running on the same system as the RT_VFY application (e.g., as PMDF), then leave it as 127.0.0.1; this is marginally faster on IP stacks then putting in the real IP address and gives you greater independence if the PMDF system is moved to a different IP address.
Note that this particular image will be loaded once for every recipient of every message, so performance is a big deal.
Make sure, if you are using a compiled configuration, to rebuild the configuration after adding these to the MAPPINGS. (or mappings) file.
It is very important that the mappings have the exact form given here.
RT_VFY runs without any configuration file whatsoever; this is for performance reasons.
RT_LOG operation is governed by a configuration file, which is located in the directory /pmdf/table. Even though RT_LOG does not have to be on a system with PMDF, this file is normally located in this directory. [perhaps let people move it???]
If you change the RT_LOG configuration file, you have two options:
Generally, option (2) is preferable.
Almost all of RT_LOG's parameters are dynamic and can be modified during operation. However, a very few having to do with table sizes and timers are not modifiable. Always check the documentation before assuming that a parameter can be changed on the fly!
Under normal circumstances, RT_LOG should not be stopped and restarted on a running system.
3.1 RT_OPTION configuration values
All RT parameters should be defined in RT_OPTION. All parameters may be changed while RT_LOG is running except those marked below as "not dynamic."
The format of the configuration file is fairly simple:
# comments ! comments ; comments < included file variable = value (for example: MAXRULES = 100 POSTMASTERemail@example.com )
There is a special case of
BAD_NAME = dan BAD_NAME = ned
which results in an internal representation (keyvalue) of this as
BAD_NAME = dan;ned
No keyvalue may be longer than MAXVALUE characters (currently 32000 characters); anything which would put it over that limit will be ignored.
Note that spaces and case are not significant.
3.1.1 DEBUG; VERBOSE
DEBUG is set to 1 to say that debugging should be turned on.
VERBOSE is set to 1 to indicate that a record should be written to the RT log (standard out) for every deny or slow. Generally, you should leave VERBOSE set to 0. If you do set VERBOSE to 1, make sure you trim your log file.
Note that if DEBUG is set to 1, then this implies that VERBOSE is also set to 1.
GCTIM is the time between garbage collection runs. On a busy system, this value is almost irrelevant---you'll get GCMSGS first. On a slow system, you want to set this at some balanced time which won't burden the system, yet will let you detect spam. Remember: we don't know someone's been bad until we've GC'ed.
RT uses multiple threads to ensure that mail is not delayed while RT is performing "housekeeping" operations. However, there is a window during certain operations when incoming mail messages will be delayed until housekeeping has completed. For a modern processor, this is on the order of 100 to 400 milliseconds. GCTIM is used to determine how often the housekeeping jobs in RT operate.
This value is also used to define the "window" for an electronic mail storm to go unnoticed. RT is designed to use GCTIM as a hold-down timer so that a burst of email which occurs entirely between GCTIM intervals will not be delayed. The goal is to set this window wide enough to allow normal bursts of mail to enter without triggering the storm code to shut down mail yet not so large as to fail to stop a real storm.
In version 2.0 of RT, this logical name is expressed in seconds. (Previously it was an OpenVMS delta time)
GCTIM values of less than 10 seconds are strongly discouraged and will probably result in very poor performance.
NOTE: This parameter is not dynamic; if you want to change it, you must restart RT_LOG
GCMSGS is the number of messages (actually: recipients) between garbage collection runs. On a busy system, this value needs to be carefully tweaked to be as low as possible to catch any spam, yet not so low as to affect operation of the system.
GCMSGS values of less than 50 are strongly discouraged and will probably result in very poor performance.
DETAIL_AGE is the longest time that we'll keep track of a record for possible computation in the thresholds. This value should be longer than any of the _PERIOD values below, but it doesn't have to be more than 1 second longer.
default: 6000 (100 minutes)
3.1.5 ACCEPT_IP, REJECT_IP
ACCEPT_IP is a list of IP addresses which should always be accepted no matter what else happens. You may specify these on a single line separated by semi-colons or you may put them on separate lines and they will be concatenated together automatically by RT. This parameter can be used to define common trading partners for electronic mail which have volumes much higher than the RT storm and bomb detection parameters would allow.
Exercise caution in indiscriminate use of this parameter. Since most SMTP mailers are configured to accept and pass on any message they receive, a clever attacker could simply redirect a storm of mail through any site on the ACCEPT_IP list. If possible, ask sites on the ACCEPT_IP list to re- configure their mailers to operate in a more secure fashion.
IP addresses should be in the format "a.b.c.d" or "a.b.c.d/e" where "a.b.c.d" is a simple IP address which will match EXACTLY and "a.b.c.d/e" is a subnet-qualified IP address (such as 188.8.131.52/24) which will match all addresses that are in that net/subnet as specified. If you don't know what the slash notation means, there are RFCs to help.
REJECT_IP is like ACCEPT_IP, except that these are addresses which will ALWAYS be rejected.
default: no default
3.1.6 ACCEPT_EFROM, REJECT_EFROM, ACCEPT_ETO, REJECT_ETO
ACCEPT_EFROM is a list of envelope "FROM" addresses which will always be accepted. You may specify these on a single line separated by semicolons or you may put them on separate lines and they will be concatenated together automatically by RT.
You may use the wildcard character "*" in your ACCEPT_EFROM (such as "*@opus1.com." All comparisons of email addresses are case-INsignificant (e.g., firstname.lastname@example.org is the same as JMS@Opus1.COM).
default: no default
REJECT_EFROM is like ACCEPT_EFROM, except that these are envelope from addresses which will ALWAYS be rejected.
ACCEPT_ETO and REJECT_ETO are are the same as ACCEPT_EFROM and REJECT_EFROM, but we with envelope To addresses instead.
default: none (for either)
SPAMBAIT is a list of Envelope TO addresses which are always rejected. However, SPAMBAIT addresses are a sign that this message is so badly out of whack that it should not be accepted, nor should anything else from this user for some period of time. Therefore, when we get a SPAMBAIT match, we IMMEDIATELY put the IP address of this user on the "do not accept" list.
Like the REJECT_EFROM addresses, SPAMBAIT addresses may be put all on a single line, or put on multiple lines.
3.1.8 IP_DENY_*, IP_SLOW_*
IP_DENY_LIMIT and IP_DENY_PERIOD work together to determine the basic spamming threshold. Any IP address which sends more than IP_DENY_LIMIT addresses during IP_DENY_PERIOD will be put on the deny list until they naturally fall off of it (e.g., until their send rate falls below the IP_DENY_LIMIT).
Thus, the idea is to have the DENY_LIMIT be fairly small, and the DENY_PERIOD be fairly long. That way, once they hit the LIMIT they will stay there until the PERIOD times out. If PERIOD is too short, the list will oscillate too much.
default for IP_DENY_LIMIT 300 (300 msgs) default for IP_DENY_PERIOD 300 (300 seconds, or 5 minutes)
IP_SLOW_LIMIT and IP_SLOW_PERIOD are similar to the IP_DENY_* values except that an IP address passing the threshold of IP_SLOW_LIMIT will only be slowed down by IP_SLOW_TIME.
IP_SLOW_LIMIT should be much less than IP_DENY_LIMIT to throttle the user before turning them off.
default for IP_SLOW_LIMIT 200 (200 msgs) default for IP_SLOW_PERIOD 300 (300 seconds, or 5 minutes) default for IP_SLOW_TIME 10 (10 hundreds of a second, or 1/10 second)
3.1.9 EFROM_SLOW_*, EFROM_DENY_*, ETO_SLOW_*, ETO_SLOW_*
EFROM_SLOW_LIMIT, EFROM_SLOW_PERIOD, EFROM_SLOW_TIME and EFROM_DENY_LIMIT are all similar to the IP_DENY_* and IP_SLOW_* values, except that there is no EFROM_DENY_PERIOD (use EFROM_SLOW_PERIOD instead): EFROM_SLOW_LIMIT messages in EFROM_SLOW_PERIOD time will cause future messages to be delayed by EFROM_SLOW_TIME; EFROM_DENY_LIMIT messages in EFROM_SLOW_PERIOD time will cause future messages to be simply refused.
default for EFROM_SLOW_LIMIT 200 (200 msgs) default for EFROM_SLOW_PERIOD 300 (300 seconds, or 5 minutes) default for EFROM_SLOW_TIME 10 (10 hundreds of a second, or 1/10 second) default for EFROM_DENY_LIMIT 300 (300 msgs)
ETO_SLOW_LIMIT, ETO_SLOW_PERIOD, ETO_SLOW_TIME and ETO_DENY_LIMIT are all similar to the IP_DENY_* and IP_SLOW_* values, except that there is no ETO_DENY_PERIOD.
default for ETO_SLOW_LIMIT 200 (200 msgs) default for ETO_SLOW_PERIOD 300 (300 seconds, or 5 minutes) default for ETO_SLOW_TIME 10 (10 hundreds of a second, or 1/10 second) default for ETO_DENY_LIMIT 300 (300 msgs)
3.1.10 MAXDETAIL, MAXSTATICS, MAXDENY
MAXDETAIL, MAXSTATICS, and MAXDENY are all semi-dynamic parameters. They are read once during system startup and are not ever changed; if you want to change the size of these lists, you'll have to restart the RT_LOG software process.
MAXDETAIL is the length of the detail list. You want to be able to keep one record for each recipient for the period of time you're holding records (DETAIL_AGE). Thus, if you're holding messages for 1 hour, and you do 50,000 messages a day, you probably want MAXDETAIL to be at least 2,000 and probably more like 5,000.
RT has been tested with up to 10,000 detail records. When there are 10,000 detail records, it takes approximately .4 CPU seconds on a 175 MHz Alpha to handle the GC task. If you need more than 10,000 records, that's fine, but you should test to make sure that garbage collection (GC) time doesn't make the system performance suffer. As long as garbage collection can run in less than 1 elapsed/CPU second, you're probably safe.
MAXSTATICS is used for the static lists of Rejected/Accepted IP/EFROM/ETO addresses from this file. There are six lists and each is the same size (MAXSTATICS). Make sure that MAXSTATICS is larger than your longest list. It does not hurt much to make MAXSTATICS very large.
MAXDENY is a list which RT uses to log in memory actions (in addition to writing them to the log file). If you intend to use this list, then you should make MAXDENY large enough to hold all the action records (an action record is written for a deny or a slow, but not for a success) you will hold between data polls. Each data poll is destructive---it erases the array.
Be wary of making MAXDENY too large, as when it is dumped, all of the contents are pushed out onto the network and this can cause performance problems. If you just use the log to see what is going on, then make MAXDENY something small, like 10 or 20.
NOTE: These three parameters are not dynamic; if you want to change it, you must restart RT_LOG
default MAXDETAIL 50000 (recipients) default MAXSTATICS 1000 (entries in the static lists) default MAXDENY 1000 (action records/deny or slow)
RT_LOG writes to the file pointed to by the option "LOCAL_LOGFILE." If this option is not present, it writes to PMDF_LOG:RT.LOG (on OpenVMS) or /usr/pmdf/log/rt.log (on Unix operating systems).
If you want to have the output of the log go to the screen for debugging, you may set the LOCAL_LOGFILE option to be "stdout." In this case, RT_LOG simply writes to stdout. ("stderr" has a similar effect).
To "wrap" the log file so that you can cut it and read it with another program, you can use RT_CONTROl to give the "L" command to roll the log file. Rolling the log file with the "L" command causes RT_LOG to do the following:
3.2 BLAB Logical names
RT_BLAB_USERNAME - where RT_BLAB will send mail when a mail storm or bomb has been detected. Normally, this is set to a mailing list.
4.0 RT Accessory Programs
RT_CONTROL takes two arguments: an IP address, and a single letter command.
The command is sent to the server specified in the IP address (you may use a domain name if you wish). Then responses are read from the server until it signals that it is done; these are all printed on the standard output.
RT_CONTROL rt.opus1.com X
RT commands are:
D - dump the data structures to the RT log (no obvious output) X - make RT_LOG exit. Use with care! R - reload. Reload the PMDF_TABLE:RT_OPTION (or /pmdf/table/rt_option) file. N - read deny log. Display the log on standard output. T - show statistics. Display statistics on standard output. C - clear statistics. L - roll the log file.
RT_BLAB is used to read the deny log periodically and send any "significant" results to the system manager. It is called with two arguments: an IP address, and the number of events to be considered "significant."
RT_BLAB connects to the RT server and reads (destructively) the DENY log. If there are more events in the deny log than the event count on the command line, then an email message is sent to the appropriate people (based on the RT_BLAB_USERNAME logical name) telling them about it.
If there are NOT sufficient events, then RT_BLAB just exits.
$ define rt_blab_username "email@example.com" $ rt_blab rt.opus1.com 100
If there are not at least 100 deny events in the rt.opus1.com log, then rt_blab simply reads the log and exits.
The RT application uses a very simple protocol. The client (whether RT_VFY, RT_BLAB, or RT_CONTROL) sends a single UDP packet to the server on port 24.
The server then returns one or more packets as an answer. For the Query packet, only a single answer is returned. For other kinds of commands, one or more packet will come back; the transaction is over when a packet is sent which starts with the letter upper-case E.
5.1 Query packets
Client to server:
Server to client:
where <success> is either "S" indicating a successful return to PMDF or "F" indicating a failure return.
If <success> is "S", the rest of the string is sent to PMDF as the result of the mapping, unchanged. If <success> is "F", then the <string>, if present, is ignored.
S$D-500|$X4.7.1|$NService$ not$ currently$ available
5.2 Other commands
D - dump the data structures to the RT log; when done, send "END: Dumping data structures now" to client.
X - exit. Send "Exiting at user command" to indicate confirm.
R - reload. Reload the PMDF_TABLE:RT_OPTION (or /pmdf/table/rt_option) file. Any parameters which are changed will be written to the RT log file. When done, send "END: Parameters reloaded" to network.
N - read deny log. Send the contents of the Deny log to the network and clear the Deny log.
S - spamsum. Not yet implemented.
T - show statistics. Send statistics to the network and to the RT log file.
C - clear statistics. Clear most counters.
6.0 CONCEPTS for anti-spam software
Not all of the concepts below are implemented in RT version 2. However, they give an overview of the process and the directions that this product is going to take.
1) We intercept mail at two different points:
a) when it is being transmitted to us via SMTP. In this case, we get the following information: Connection-time information - source and destination IP addresses and port numbers POSSIBLE ACTIONS: accept or deny connection Per-Recipient information: - source and destination IP addresses and port numbers; Envelope FROM and Envelope TO addresses PMDF channel names (not very useful) POSSIBLE ACTIONS: accept or deny the recipient with 400 or 500 level error code; can also delay response to 'slow down' transaction b) when the message has been received and queued, but before it is delivered to the end user. NOTE: This is very high overhead, in the sense that we now have to handle every message twice. Essentially, when we run the filter channel, potential throughput of the relay. In this case, we have the entire message to handle and filter on. We can accept, reject, forward, etc. the message. However, for spam, rejection is likely to be useless.
2) We can maintain as much state information as needed. The RT (RT = real time) server sits on UDP port 24 and can communicate with the SMTP receiver (1a above) and the filter channel (1b above).
3) As each message comes in, the filter channel will compute a "spamsum" of the message using the following algorithm:
The spamsum is then sent to the RT server, who will maintain a history of messages with identical spamsums.
4) The general mechanism RT uses looks something like this:
5) Here are the generic anti-spam criteria
For the RT server:
For the Filter Channel:
The "spamsum count" of a message is the number of times that the spamsum of this message has been seen in the past <xxx> hours, times the natural log of the number of recipients of the message (e.g., a message which has been seen by one person has a spamsum count of 1; by 10 people a count of 2.3; by 100 people a count of 4.6; by 1000 people a count of 6.9).
Any message with a spamsum of 5 gets the sender IP address put on the "slow" list for <yyy> minutes; all transactions will be slowed by <zzz> seconds.
Any message with a spamsum of 10 gets the sender IP address put on the "deny" list for <yyy> minutes.
© 1999 Opus One . Site by DesertNet Designs