amavis-user September 2010 archive
Main Archive Page > Month Archives  > amavis-user archives
amavis-user: Re: [AMaViS-user] Complex sql lookup query, bad per

Re: [AMaViS-user] Complex sql lookup query, bad performance

From: Mark Martinec <Mark.Martinec+amavis_at_nospam>
Date: Fri Sep 17 2010 - 15:29:41 GMT
To: amavis-user@lists.sourceforge.net

Marco,

> We build a clustered mail system with postfix, amavisd, clamd, SA,
> dovecot, mysql, etc. 2 Servers are handling incoming mail (MX records
> pointing to us) and 2 are handling outgoing mail (relaying). mysql
> database is stored on 2 master-master replicated servers.
>
> so far, everything is working well. but we have problems with the
> sql_lookup from amavisd-new that we need to lookup per recipient
> whitelisting.
>
> $sql_select_policy = 'select amavis_policy.*,amavis_recipients.id as id '.
> 'FROM (hostings,domains,amavis_recipients,products,amavis_policy) LEFT JOIN (users,accounts) '.
> 'ON accounts.hosting_id=hostings.id AND users.account_id=accounts.id '.
> 'WHERE (amavis_recipients.hosting_id = hostings.id OR amavis_recipients.account_id = accounts.id) '.
> 'AND domains.hosting_id=hostings.id '.
> 'AND hostings.product_id = products.id '.
> 'AND hostings.amavis_policy_id = amavis_policy.id '.
> 'AND ( (is_mailbox=1 AND concat(username,\'@\',domainname) IN (%k)) '.
> ' OR ((is_transport=1 OR is_mxbackup=1) AND concat(\'@\',domainname) IN (%k))'.
> ')';
> [...]
> The DB layout is like this: one account can have multiple usernames
> (aliases), one hosting can have multiple domains, one hosting can have
> multiple accounts.
> that means, that we don't have a email address field. so we have to
> merge (concat) domainname and username field to check the %k strings.
>
> we also need per recipient email address and domain lookup, not just
> email address. because we also do something like email proxy, where we
> don't know the usernames, just the domains for a customer.
>
> the query takes about 8 seconds with 3500 usernames and 400 domains.
>
> is there a way to change the query like this (separate variables in
> the query for domain and username)?
>
> select amavis_policy.*,amavis_recipients.id as id FROM
> (hostings,domains,amavis_recipients,products,amavis_policy) LEFT JOIN
> (users,accounts) ON accounts.hosting_id=hostings.id AND
> users.account_id=accounts.id WHERE (amavis_recipients.hosting_id =
> hostings.id OR amavis_recipients.account_id = accounts.id) AND
> domains.hosting_id=hostings.id AND hostings.product_id = products.id
> AND hostings.amavis_policy_id = amavis_policy.id AND (
> (is_mailbox=1 AND username='<<<<username>>>>' and
> domainname='<<<<domain>>>>') OR ((is_transport=1 OR is_mxbackup=1) AND
> domainname='<<<<domain>>>>'));
>
> this query is executed in ~0.04 sec... what would be okay...

Adding a few more placeholders to existing %k and %a would be
a small change. I can add the following:

  %l -> full unmodified localpart,
  %u -> lowercased username (a localpart without extension)
  %e -> lowercased address extension (including a delimiter),
  %d -> lowercased domain,

if that will help in your situation.

For example, given the: User+Foo@Sub.Example.COM
  
  %a User+Foo@Sub.Example.COM
  %l User+Foo
  %u user
  %e +foo
  %d sub.example.com

  %k User+Foo@sub.exAMPLE.COM
     user+foo@sub.example.com
     user@sub.example.com
     user+foo
     user
     @sub.example.com
     @.sub.example.com
     @.example.com
     @.com
     @.

Below is a patch for 2.6.4. It will go into in 2.7.0-pre8.

--- amavisd.orig 2009-06-25 14:39:01.000000000 +0200
+++ amavisd 2010-09-17 17:13:34.139201079 +0200
@@ -1138,9 +1138,12 @@
   # for a recipient (e.g. a full address, domain only, catchall), %a will be
   # replaced by an exact recipient address (same as the first entry in %k,
- # suitable for pattern matching). Use ORDER, if there is a chance that
- # multiple records will match - the first match wins (i.e. the first
- # returned record). If field names are not unique (e.g. 'id'), the later
- # field overwrites the earlier in a hash returned by lookup, which is why
- # we use 'users.*, policy.*, users.id', i.e. the id is repeated at the end.
+ # suitable for pattern matching), %l by a full unmodified localpart, %u by
+ # a lowercased username (a localpart without extension), %e by lowercased
+ # addr extension (which includes a delimiter), and %d for lowercased domain.
+ # Use ORDER if there is a chance that multiple records will match - the
+ # first match wins (i.e. the first returned record). If field names are
+ # not unique (e.g. 'id'), the later field overwrites the earlier in a hash
+ # returned by lookup, which is why we use 'users.*, policy.*, users.id',
+ # i.e. the id is repeated at the end.
   # This is a legacy variable for upwards compatibility, now only referenced
   # by the program through a %sql_clause entry 'sel_policy' - newer config
@@ -14730,5 +14733,6 @@
     # $clause_name is a key into %sql_clause of the currently selected
     # policy bank; one level of indirection is allowed in %sql_clause result,
- # the resulting SQL clause may include %k or %a, to be expanded
+ # the resulting SQL clause may include %k, %a, %l, %u, %e, %d placeholders,
+ # to be expanded
     bless { conn_h => $conn_h, incarnation => 0, clause_name => $clause_name },
           $class;
@@ -14831,14 +14835,22 @@
   local($1); my(@pos_args); my($sel_taint) = substr($sel,0,0); # taintedness
   my($datatype) = $sql_allow_8bit_address ? SQL_VARBINARY : SQL_VARCHAR;
- $sel =~ s{ ( %k | %a | \? ) } # substitute %k for keys, %a for exact mail
- # address, and ? for each extra arg
+
+ # substitute %k for a list of keys, %a for unmodified full mail address,
+ # %l for full unmodified localpart, %u for lowercased username (a localpart
+ # without extension), %e for lowercased extension, %d for lowercased domain,
+ # and ? for each extra argument
+ $sel =~ s{ ( %[kalued] | \? ) }
            { push(@pos_args, $1 eq '%k' ? map { [$_, $datatype] } @$keys_ref
- : $1 eq '%a' ? [$keys_ref->[0], $datatype]
- # same as first in %k
+ : $1 eq '%a' ? [$rhs_ref->[0], $datatype] #full addr
+ : $1 eq '%l' ? [$rhs_ref->[1], $datatype] #localpart
+ : $1 eq '%u' ? [$rhs_ref->[2], $datatype] #username
+ : $1 eq '%e' ? [$rhs_ref->[3], $datatype] #extension
+ : $1 eq '%d' ? [$rhs_ref->[4], $datatype] #domain
                            : shift @extras_tmp),
              $1 eq '%k' ? join(',', ('?') x $n) : '?' }gxe;
   $sel = untaint($sel) . $sel_taint; # keep original clause taintedness
- ll(4) && do_log(4,"lookup_sql \"%s\", query args: %s",
- $addr, join(', ', map{"\"$_\""} @pos_args) );
+ ll(4) && do_log(4,"lookup_sql \"%s\", query args: %s", $addr,
+ join(', ', map { !ref $_ ? '"'.$_.'"' : '['.join(',',@$_).']'}
+ @pos_args) );
   ll(4) && do_log(4,"lookup_sql select: %s", $sel);
   my($a_ref,$found); my($match) = {}; my($conn_h) = $self->{conn_h};

Mark

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
 Please visit http://www.ijs.si/software/amavisd/ regularly
 For administrativa requests please send email to rainer at openantivirus dot org