A Broadband and ADSL forum. BroadbanterBanter

Welcome to BroadbanterBanter.

You are currently viewing as a guest which gives you limited access to view most discussions and other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today.

Go Back   Home » BroadbanterBanter forum » Newsgroup Discussions » uk.telecom.broadband (UK broadband)
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

uk.telecom.broadband (UK broadband) (uk.telecom.broadband) Discussion of broadband services, technology and equipment as provided in the UK. Discussions of specific services based on ADSL, cable modems or other broadband technology are also on-topic. Advertising is not allowed.

Best way to extract and send ~10k emails



 
 
Thread Tools Display Modes
  #1  
Old October 3rd 12, 07:46 PM posted to uk.telecom.broadband
Peter
external usenet poster
 
Posts: 330
Default Best way to extract and send ~10k emails

This is a genuine requirement, not spam...

I need to first extract email addresses from a huge (1GB) text file
which is in unix format (LF between lines, not CRLF) and which was
produced by exporting emails from the Agent email program.

That could be done with sed etc, but I need to de-duplicate the stuff
too.

The extraction needs to dump any names so e.g with

"Joe Bloggs"

I want to end up with only



and that can then be de-duplicated safely.

Can anyone suggest a prog for this? I have downloaded a few (of the
many) free ones out there but none of them seem to work.

Then I need to email the lot with a short text-only email. I can't
really paste 10k email addresses into the BCC field of my email prog
and in any case a lot of spam filters dump emails addressed via
BCC, so I want to address them via TO.

There is a whole pile of spammer programs out there but I just want
something very simple. Also I want to transmit only about 1 per
minute, to avoid them being dumped by antispam measures at the big
ISPs.

Any suggestions much appreciated.
  #2  
Old October 3rd 12, 07:50 PM posted to uk.telecom.broadband
The Natural Philosopher
external usenet poster
 
Posts: 2,728
Default Best way to extract and send ~10k emails

Peter wrote:
This is a genuine requirement, not spam...

I need to first extract email addresses from a huge (1GB) text file
which is in unix format (LF between lines, not CRLF) and which was
produced by exporting emails from the Agent email program.

That could be done with sed etc, but I need to de-duplicate the stuff
too.

The extraction needs to dump any names so e.g with

"Joe Bloggs"

I want to end up with only



and that can then be de-duplicated safely.

Can anyone suggest a prog for this? I have downloaded a few (of the
many) free ones out there but none of them seem to work.

Then I need to email the lot with a short text-only email. I can't
really paste 10k email addresses into the BCC field of my email prog
and in any case a lot of spam filters dump emails addressed via
BCC, so I want to address them via TO.

There is a whole pile of spammer programs out there but I just want
something very simple. Also I want to transmit only about 1 per
minute, to avoid them being dumped by antispam measures at the big
ISPs.

Any suggestions much appreciated.


learn 'C'

Or awk sed and grep

--
Ineptocracy

(in-ep-toc'-ra-cy) - a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
  #3  
Old October 3rd 12, 08:05 PM posted to uk.telecom.broadband
Graham.
external usenet poster
 
Posts: 93
Default Best way to extract and send ~10k emails

On Wed, 03 Oct 2012 19:50:22 +0100, The Natural Philosopher
wrote:

Peter wrote:
This is a genuine requirement, not spam...

I need to first extract email addresses from a huge (1GB) text file
which is in unix format (LF between lines, not CRLF) and which was
produced by exporting emails from the Agent email program.

That could be done with sed etc, but I need to de-duplicate the stuff
too.

The extraction needs to dump any names so e.g with

"Joe Bloggs"

I want to end up with only



and that can then be de-duplicated safely.

Can anyone suggest a prog for this? I have downloaded a few (of the
many) free ones out there but none of them seem to work.

Then I need to email the lot with a short text-only email. I can't
really paste 10k email addresses into the BCC field of my email prog
and in any case a lot of spam filters dump emails addressed via
BCC, so I want to address them via TO.

There is a whole pile of spammer programs out there but I just want
something very simple. Also I want to transmit only about 1 per
minute, to avoid them being dumped by antispam measures at the big
ISPs.

Any suggestions much appreciated.


learn 'C'

Or awk sed and grep


Wordpad?
I had some database data last week that wasn't in the expected format,
normally if I look at it in Notepad it's tabulated with each record on
a separate line, but this time it wasn't. The solution was to open it
in Wordpad and re-save it. So how does that work? does Wordpad insert
a CR after it sees a LF?

--
Graham.
%Profound_observation%
  #4  
Old October 3rd 12, 08:20 PM posted to uk.telecom.broadband
alexd
external usenet poster
 
Posts: 1,765
Default Best way to extract and send ~10k emails

The Natural Philosopher (for it is he) wrote:

learn 'C'


Ah yes, C, that wonderful text-processing language.

Or awk sed and grep


Here's a start for you:

http://www.putorius.net/2011/12/grep...text-file.html

--
http://ale.cx/ (AIM:troffasky) )
20:10:43 up 13:01, 3 users, load average: 0.76, 0.46, 0.44
Qua illic est reprehendit, illic est a vindicatum

  #5  
Old October 3rd 12, 09:10 PM posted to uk.telecom.broadband
Peter
external usenet poster
 
Posts: 330
Default Best way to extract and send ~10k emails


alexd wrote

The Natural Philosopher (for it is he) wrote:

learn 'C'


Ah yes, C, that wonderful text-processing language.


Indeed

I can do C but it's a bit of work.

Or awk sed and grep


Here's a start for you:

http://www.putorius.net/2011/12/grep...text-file.html


That will extract them but it won't remove duplicates.

There must be readymade tools for this job...

BTW the reason the original file is so big is because it contains
stuff like uuencoded images etc. There should be only about 10k email
addresses among that stuff.
  #6  
Old October 3rd 12, 10:22 PM posted to uk.telecom.broadband
Bob Eager
external usenet poster
 
Posts: 177
Default Best way to extract and send ~10k emails

On Wed, 03 Oct 2012 21:10:31 +0100, Peter wrote:

alexd wrote

The Natural Philosopher (for it is he) wrote:

learn 'C'


Ah yes, C, that wonderful text-processing language.


Indeed

I can do C but it's a bit of work.

Or awk sed and grep


Here's a start for you:

http://www.putorius.net/2011/12/grep...ses-from-text-

file.html

That will extract them but it won't remove duplicates.


If you have a file, with email addresses one to a line, then it's trivial
with just two commands.

Use sort to sort the file.

Use uniq to remove duplicates.



--
Use the BIG mirror service in the UK:
http://www.mirrorservice.org

*lightning protection* - a w_tom conductor
  #7  
Old October 3rd 12, 10:47 PM posted to uk.telecom.broadband
Martin Brown
external usenet poster
 
Posts: 343
Default Best way to extract and send ~10k emails

On 03/10/2012 21:10, Peter wrote:

alexd wrote

The Natural Philosopher (for it is he) wrote:

learn 'C'


Ah yes, C, that wonderful text-processing language.


Indeed

I can do C but it's a bit of work.

Or awk sed and grep


Here's a start for you:

http://www.putorius.net/2011/12/grep...text-file.html


That will extract them but it won't remove duplicates.

There must be readymade tools for this job...


Excel could do it once you have a list of extracted email addresses.
I can't help feeling that you are just another lazy spammer though.

BTW the reason the original file is so big is because it contains
stuff like uuencoded images etc. There should be only about 10k email
addresses among that stuff.



--
Regards,
Martin Brown
  #8  
Old October 3rd 12, 10:51 PM posted to uk.telecom.broadband
The Natural Philosopher
external usenet poster
 
Posts: 2,728
Default Best way to extract and send ~10k emails

Peter wrote:
alexd wrote

The Natural Philosopher (for it is he) wrote:

learn 'C'

Ah yes, C, that wonderful text-processing language.


Indeed

I can do C but it's a bit of work.

Or awk sed and grep

Here's a start for you:

http://www.putorius.net/2011/12/grep...text-file.html


That will extract them but it won't remove duplicates.

| sort -u

There must be readymade tools for this job...

gcc + vi


BTW the reason the original file is so big is because it contains
stuff like uuencoded images etc. There should be only about 10k email
addresses among that stuff.


I usually start farting about with grep

Grep for lines with @ in to start.
That gets most of the **** out
If the stuff is in angle brackets awk away

--
Ineptocracy

(in-ep-toc'-ra-cy) - a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
  #9  
Old October 3rd 12, 11:47 PM posted to uk.telecom.broadband
Theo Markettos
external usenet poster
 
Posts: 539
Default Best way to extract and send ~10k emails

Peter wrote:

alexd wrote

The Natural Philosopher (for it is he) wrote:

learn 'C'


Ah yes, C, that wonderful text-processing language.


Indeed

I can do C but it's a bit of work.


If you're doing text processing these days, use perl or python (maybe with a
suitable library). There is really really no reason to use C. You don't
need to know much of the language for this sort of job - google will suffice
for working out how to do most of the things you need.

Or awk sed and grep


And that's painful too - how long are you going to spend tweaking your
regex? Is it going to cope with all the characters that are legal in email
addresses (a lot more than you think, including international characters)?

That will extract them but it won't remove duplicates.


The other issue is that grepping for [email protected] will pick
up message-IDs, and you might get false-positives from encoded stuff:
]426^&1$%"
And what about email addresses mentioned in the body of messages? Do you
want those or not?

Python has an email.parser module that does all this for you - see the
second and fifth examples:
http://docs.python.org/library/email-examples.html

The Perl equivalent is he
http://search.cpan.org/~rjbs/Email-S...mail/Simple.pm

For dedup I'm sure there's a perl/python way, but I'd just pipe the list
to | sort | uniq

Theo
  #10  
Old October 4th 12, 01:06 AM posted to uk.telecom.broadband
The Natural Philosopher
external usenet poster
 
Posts: 2,728
Default Best way to extract and send ~10k emails

Theo Markettos wrote:
Peter wrote:
alexd wrote

The Natural Philosopher (for it is he) wrote:

learn 'C'
Ah yes, C, that wonderful text-processing language.

Indeed

I can do C but it's a bit of work.


If you're doing text processing these days, use perl or python (maybe with a
suitable library). There is really really no reason to use C.


Th reason is you don't have to learn yet another Comp Sci's masturbatory
ejaculations.

You don't
need to know much of the language for this sort of job - google will suffice
for working out how to do most of the things you need.

Or awk sed and grep


And that's painful too - how long are you going to spend tweaking your
regex? Is it going to cope with all the characters that are legal in email
addresses (a lot more than you think, including international characters)?


Well exactly, Thats why I use C..

That will extract them but it won't remove duplicates.


The other issue is that grepping for [email protected] will pick
up message-IDs, and you might get false-positives from encoded stuff:
]426^&1$%"
And what about email addresses mentioned in the body of messages? Do you
want those or not?

Python has an email.parser module that does all this for you - see the
second and fifth examples:
http://docs.python.org/library/email-examples.html

The Perl equivalent is he
http://search.cpan.org/~rjbs/Email-S...mail/Simple.pm

For dedup I'm sure there's a perl/python way, but I'd just pipe the list
to | sort | uniq

Theo



--
Ineptocracy

(in-ep-toc'-ra-cy) - a system of government where the least capable to
lead are elected by the least capable of producing, and where the
members of society least likely to sustain themselves or succeed, are
rewarded with goods and services paid for by the confiscated wealth of a
diminishing number of producers.
 




Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How do I extract modem logs to PC ? C R Briggs uk.telecom.broadband (UK broadband) 3 July 19th 10 10:13 PM
Send 900 Billion valid Emails in Just 5 Minutes Only! Nuclear Incorporation. uk.telecom.voip (UK VOIP) 0 April 1st 07 10:23 AM
Send 900 Billion valid Emails in Just 5 Minutes Only! Nuclear Incorporation. uk.telecom.broadband (UK broadband) 0 April 1st 07 10:23 AM
Send 900 Billion valid Emails in Just 5 Minutes Only! Nuclear Incorporation. uk.comp.home-networking (UK home networking) 0 April 1st 07 10:15 AM
Intermittent "The page cannot be displayed" from IE but OE can still send/receive emails Martin Underwood uk.comp.home-networking (UK home networking) 4 October 23rd 05 11:24 PM


All times are GMT +1. The time now is 11:16 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.Content Relevant URLs by vBSEO 2.4.0
Copyright 2004-2019 BroadbanterBanter.
The comments are property of their posters.