Tag Archives: spam

new regex stuff!

logical operators! thanks ian! 😉

+ () [] - |

(stuff that remains the same)+(stuff that changes) – otherwise known as “capture groups”

[89] = 8 or 9

[0-4] = 0, 1, 2, 3, or 4

| = logical OR

so…

\D(85\.157\.47\.)+(12[89]|1[3-9][0-9]|2[0-4][0-9]|25[0-5])\D

means “capture everything in 85.157.47.128/25”

which, up until now, has meant “make a separate rule for every IP address between 85.157.47.128 and 85.157.47.255” — 128 SEPARATE RULES, which takes A LONG time, and slows down processing speed.

this is a BIG step forward!

WOO!!! 😎👍

ETA 200205: even more WOO!!! because ian directed me to a RegEx Numeric Range Generator, which means that i don’t have to figure them all out myself! WOO!!! 😎👍

calm… i hope no storm…

the past three full days now, i have gotten SIGNIFICANTLY less spam than normal… like, normally i’ll get anywhere from two to six DOZEN spam messages a day, and, since saturday, i have gotten, maybe two dozen total

i’ve been blocking ranges of IP addresses in argentina and peru and china and india and denmark and kazakhstan and iran and lithuania and brazil and germany and LOTS of ranges for russia, and luxembourg and vietnam and turkey and indonesia and romania and the UK and georgia (the country, not the state in the united states), and nigeria and egypt and cambodia and myanmar (and that’s only up to the 45.0.0.0/8 range) like a mad fiend, for about two months prior to saturday… and all of those places are places from which i have never received email that was not spam…

literally, i’ve been blocking JUST ranges connected with the 1LfYcbCsssB2niF3VWRBTVZFExzsweyPGQ “bitcoin porn sextortion” scam since october 4th. 🤬

maybe i’ve finally caught up with the script. i’ve got 1,043 filter rules, and a fair portion of them are IP ranges…

but it feels weird… nobody has complained that they’re not getting important emails, and the false positives that have been coming through are usually either dealt with by changing “contains” to “matches regex”, or by deleting rules that i don’t need any longer… like the one for the .mp TLD, which was giving me false positives all the time because of mailchi.mp, which, while spammy, is not universally spammy, and, as far as i can tell, is the only NON-spammy use of the .mp TLD… but i decided that, instead of figuring out how to rule out legitimate use of a spammy TLD, i just started banning the countries that the spam was coming from…

but it feels weird… i’ve been on edge for a couple of days now, and i’m pretty sure it’s directly related to my relationship with the computer and the ‘net… 😒

but not entirely related… i had a pair of blue sunglasses that i got before i went to oregon to busk, a few months ago, and i lost them about a week ago. since then i’ve been losing a whole bunch of other things — keys, tools, credit cards, that sort of thing — and i’ve been finding them again, usually in the same day, sometimes within the same 15 minutes or so… but i haven’t been able to find my sunglasses, and it PISSES ME OFF because the reason i got them, primarily, was to help aleviate some of my depression, and they have worked ADMIRABLY for that purpose… and i remember thinking, if i put them… wherever it was that i put them… 😕 and left them there for too long, i would probably not remember where they were, the next time i looked for them… 😒

it’s possible that they’re somewhere around the house, but i’ve looked at least three times in every place i can think of, and quite a few that i couldn’t have thought of in a long time, and have nothing to show for it except a much cleaner house. they’re not in the car, as far as i can tell, nor are they in my tuba case, or my tuba bag.

moe is going away for a few days — travelling for stuff related to her book — starting friday, which means that i won’t be able to go busking. and then panto starts (shudder) saturday: two shows, and two shows on sunday, which means that i won’t even be here to take care of the pets for significant portions of both days… fortunately, i’m picking her up at the airport after sunday’s shows are over.

and, on the unicycle side of things, i think i am actually learning to ride the unicycle… i have been consistently riding, in a “more-or-less” controlled fashion, in a marginally straight line, without falling over, half to three-quarters of the way across the gym, for two weeks now. and, i just got “certified” to come in and use the gym for practicing unicycle on days that we’re not having class, so i actually have a place to practice.

/8 blocks

i now have three /8 blocks in my email filters.

25.0.0.0/8 in the UK, 53.0.0.0/8 in germany, and 133.0.0.0/8 in japan.

the “standard” email filters, built on “and/or” and “contains/does not contain”, break down when you’re dealing with 16.75 MILLION addresses.

they break down because you can’t just filter on 25. which appears in the middle and end of IP addresses, in message ID numbers, and, occaisionally, in the body of the message.

the result is A LOT of false positives: email which i can’t forward to the correct recipient, because it will get filtered AGAIN

which is quite annoying. 😒

so, with the help of my friend robert, i built a regular expression to handle it:

\D25\.\d{1,3}\.\d{1,3}\.\d{1,3}\D

finds non-digit character followed by “25.”, followed by three repititions of one to three digits, interspersed by periods, followed by another non-digit character.

technically, this regex could be adapted to accomodate any IP address, which means that, theoretically, i have a whole new, easier, and faster method of processing spam. 😈

the next step is to learn how to search for a specific range of digits… 😈

ETA 191127 i discovered that you can’t specify a range of digits with a regex. for that, you need a script, which is too much work. also, i determined that i DON’T need the white space character at the beginning and end of the regular expression, because, sometimes, the IP address is surrounded by parentheses, square brackets, or both.

ETA 191128 i changed it from white space character — \s — to non-digit character — \D — because some IP addresses are surrounded by parentheses or square brackets, but some are surrounded by white space characters. the only thing \D doesn’t capture is an empty string, so the IP address can’t be the first thing in the line of text.

and, even with the \D, this regex, modified to capture 27.16.0.0/12 in china, captures 2.2019.11.27.23.41.02, which is part of the message ID on a LEGITIMATE message. 😖😒😠🤬

this is why i’m rerouting these messages, rather than summarily deleting them, which is my inclination… summarily deleting what i think is spam has come back to bite me in the ass often enough that i don’t do it any longer. 😒

oy 😒

this morning i added a second /8 block to my email filters.

for those of you wondering what i’m talking about, a /8 block is the largest block of IP addresses allocated by the IANA.

16,777,216 individual IP addresses.

my first filtered /8 block was in japan. my second one was in germany.

and i STILL get spam from japan and from germany. 😒

it doesn’t seem like it was that long ago that spam was something in a monty python skit, and before that, it was a canned meat byproduct.

it’s not even UCE any longer, because most of it is devoted to scams of one kind or another. actual, commercial email is a tiny fraction of the volumes of script-generated spam, these days.

spam times 16,777,216²… which is a number so large my scientific calculator chokes on it… which is to say, it says 2.81474976711e+14 rather than giving me a number i can understand. 😒

knock wood…

for the first time in a VERY long time, i booted up my computer, checked my email, and did NOT have at least 10 “bitcoin-porn-scam-spam” messages in my spam folder…

in fact, i had NO “bitcoin-porn-scam-spam” messages in my spam folder… or anywhere else…

there was spam in my spam folder, but no “bitcoin-porn-scam-spam” messages.

maybe this is a good sign.

ETA: not as good a sign as i would have hoped, but on the plus side, i now have blocks on more of uzbekistan, kazakhstan, bangladesh, and south africa than i did before.

oh well… 🤷