A Broadband and ADSL forum. BroadbanterBanter

Welcome to BroadbanterBanter.

You are currently viewing as a guest which gives you limited access to view most discussions and other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today.

Go Back   Home » BroadbanterBanter forum » Newsgroup Discussions » uk.comp.home-networking (UK home networking)
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

uk.comp.home-networking (UK home networking) (uk.comp.home-networking) Discussion of all aspects of computer networking in the home, regardless of the platforms, software, topologies and protocols used. Examples of topics include recommendations for hardware or suppliers (e.g. NICs and cabling), protocols, servers, and specific network software. Advertising is not allowed.

'fetch' or 'get' a web served page text



 
 
Thread Tools Display Modes
  #1  
Old November 22nd 08, 03:20 PM posted to uk.comp.home-networking
Will
external usenet poster
 
Posts: 13
Default 'fetch' or 'get' a web served page text

I'm trying to automate getting the text from a web page into a text file.

I can do this by using IE6 and copying the text into a file (TXT) by hand,
but I'm not always quick enough to catch all the data (only holds one page
at a time). The page is served by http, I can't telnet or ftp the page
directly.

Is their another option to do what is described above?



  #2  
Old November 23rd 08, 09:30 AM posted to uk.comp.home-networking
Alex Fraser
external usenet poster
 
Posts: 553
Default 'fetch' or 'get' a web served page text

Will wrote:
I'm trying to automate getting the text from a web page into a text file.

I can do this by using IE6 and copying the text into a file (TXT) by hand,
but I'm not always quick enough to catch all the data (only holds one page
at a time). The page is served by http, I can't telnet or ftp the page
directly.

Is their another option to do what is described above?


I can't work out why you say "I'm not always quick enough ...". Can you
elaborate?

On Linux, I used to have a periodically-scheduled script that used lynx
to get plain text rendering of HTML pages and then extract information
from them. A quick Google finds Win32 ports, but I've never tried them:
eg http://www.fdisk.com/doslynx/lynxport.htm

You can get the HTML itself using something like wget (amenable to
scripting) or a download manager.

Alex
  #3  
Old November 23rd 08, 02:05 PM posted to uk.comp.home-networking
Will
external usenet poster
 
Posts: 13
Default 'fetch' or 'get' a web served page text

I can't work out why you say "I'm not always quick enough ...". Can you
elaborate?

On Linux, I used to have a periodically-scheduled script that used lynx to
get plain text rendering of HTML pages and then extract information from
them. A quick Google finds Win32 ports, but I've never tried them: eg
http://www.fdisk.com/doslynx/lynxport.htm

You can get the HTML itself using something like wget (amenable to
scripting) or a download manager.

Alex


Taa.
The log is a rolling log, I'm looking for error clues.
Not always quick enough because I might not use the pc/console every day and
the log is very limited in size so I could do with an automated copy every
day or so.
Sounds like I need something very similar to your script.
I can't see how win32 port helps me, and wget too - aren't these all lynux,
alas this is on m$ xp (hence ie6 - not used out of choice!)


  #4  
Old November 23rd 08, 05:19 PM posted to uk.comp.home-networking
Alex Fraser
external usenet poster
 
Posts: 553
Default 'fetch' or 'get' a web served page text

Will wrote:
I can't work out why you say "I'm not always quick enough ...". Can you
elaborate?

On Linux, I used to have a periodically-scheduled script that used lynx to
get plain text rendering of HTML pages and then extract information from
them. A quick Google finds Win32 ports, but I've never tried them: eg
http://www.fdisk.com/doslynx/lynxport.htm

You can get the HTML itself using something like wget (amenable to
scripting) or a download manager.


The log is a rolling log, I'm looking for error clues.
Not always quick enough because I might not use the pc/console every day and
the log is very limited in size so I could do with an automated copy every
day or so.


OK, that makes sense . However if this is for something like a router
I'm surprised there isn't an alternative method, such as syslog.

Sounds like I need something very similar to your script.
I can't see how win32 port helps me, and wget too - aren't these all lynux,
alas this is on m$ xp (hence ie6 - not used out of choice!)


wget is available as a native Windows utility and can be used to
download an HTML page, as HTML, to a file. You could run this with Task
Scheduler daily using "wget http://whatever/whatever.html". This will
create whatever.html, whatever.html.1, whatever.html.2, and so on - one
file for each time it runs. See http://users.ugent.be/~bpuype/wget/.

If this is insufficient, can you let me know some more details? I'll see
if I can come up with a simple way of doing it under Windows (which may
be to install Cygwin and do it the Unix/Linux way).

Alex
  #5  
Old November 24th 08, 05:18 PM posted to uk.comp.home-networking
Will
external usenet poster
 
Posts: 13
Default 'fetch' or 'get' a web served page text

wget is available as a native Windows utility and can be used to download
an HTML page, as HTML, to a file. You could run this with Task Scheduler
daily using "wget http://whatever/whatever.html". This will create
whatever.html, whatever.html.1, whatever.html.2, and so on - one file for
each time it runs. See http://users.ugent.be/~bpuype/wget/.

If this is insufficient, can you let me know some more details? I'll see
if I can come up with a simple way of doing it under Windows (which may be
to install Cygwin and do it the Unix/Linux way).


Tried wget, looks the ticket (at least for the grab) but can't resolve
getting a 401 error. User/pw should be right, could be cgi. Main page works
ok with the user/pw although not required. The command screen page returns
this:- (IP and dirs ***'d out):

--2008-11-24 17:00:22-- http://***.***.***.***/cgi/***/***/
Connecting to ***.***.***.***:80... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Connecting to ***.***.***.***:80... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authorization failed.

Usage is URL and requires username & password (no way around not submitting
these, although my IE is set to remember in pw list),
web page returned is single page with some java (Doh, I bet its the java)
for menu navigation (ignore), a form selection (I ignore because it is a
filter and it defaults to normal - what I want) and the tabled text of log
(required to scrape) which want to I read and record just in a txt file
(date/time/event).

I do wonder if a recorder/schedulled task might be better suited.


  #6  
Old November 24th 08, 07:55 PM posted to uk.comp.home-networking
Alex Fraser
external usenet poster
 
Posts: 553
Default 'fetch' or 'get' a web served page text

Will wrote:
[snip]
Tried wget, looks the ticket (at least for the grab) but can't resolve
getting a 401 error. User/pw should be right, could be cgi. Main page works
ok with the user/pw although not required. The command screen page returns
this:- (IP and dirs ***'d out):


There are various possibilities for a 401, which can probably be
resolved, but possibly not without excessive communication here or
giving me (or someone else) access.

Basically, you may need to repeat the request, and you may need to make
one or more "dummy" requests to set a session cookie and get the system
to the right state. Using "-d" can help you work out what's going on.
You can use "-i" to give a file containing a list of requests to be made
in sequence. By default, cookies set in a response will be sent on
relevant subsequent requests.

Usage is URL and requires username & password (no way around not submitting
these, although my IE is set to remember in pw list),
web page returned is single page with some java (Doh, I bet its the java)
for menu navigation (ignore), a form selection (I ignore because it is a
filter and it defaults to normal - what I want) and the tabled text of log
(required to scrape) which want to I read and record just in a txt file
(date/time/event).

I do wonder if a recorder/schedulled task might be better suited.


Sheduling the replay of a macro recording should work if you grab the
entire text of the page (ie Ctrl-A, Ctrl-C, open Notepad, Ctrl-V). Being
more selective will probably be impossible. That's where text-processing
tools come in.

I still remain surprised that there is no alternative to the web
interface to get the log information out of the system.

Alex
  #7  
Old November 28th 08, 03:11 AM posted to uk.comp.home-networking
Dave J.
external usenet poster
 
Posts: 139
Default 'fetch' or 'get' a web served page text

In on Mon, 24 Nov
2008 19:55:18 +0000, in uk.comp.home-networking, 'Alex Fraser' wrote:

I still remain surprised that there is no alternative to the web
interface to get the log information out of the system.


I've seen one host where (s?)ftp access (including to log files) was
'optional' - you enable it from the SSH/telnet prompt by specifying a
password before it lets you in.

Not saying the above's true in the case of the original poster, just that
lack of immediate access doesn't mean it's impossible. Depends on what's
been tried and with what results. Certainly a nicer option than messing
about translating html.

Dave J.
  #8  
Old November 29th 08, 04:21 PM posted to uk.comp.home-networking
Will
external usenet poster
 
Posts: 13
Default 'fetch' or 'get' a web served page text


"Dave J." wrote in message
...
In on Mon, 24 Nov
2008 19:55:18 +0000, in uk.comp.home-networking, 'Alex Fraser' wrote:

I still remain surprised that there is no alternative to the web
interface to get the log information out of the system.


I've seen one host where (s?)ftp access (including to log files) was
'optional' - you enable it from the SSH/telnet prompt by specifying a
password before it lets you in.

Not saying the above's true in the case of the original poster, just that
lack of immediate access doesn't mean it's impossible. Depends on what's
been tried and with what results. Certainly a nicer option than messing
about translating html.

Dave J


Certainly proving time consuming. I can't get to retry the request and
still hitting 401. I think I'll have to stick with catching what I can with
a web browser and cut n paste as before.
No, telent is disabled and no ftp.


 




Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
beautiful handbags wait for you to fetch [email protected] uk.telecom.broadband (UK broadband) 0 August 29th 07 06:28 AM
Text messaging on a landline Ian Pollard uk.telecom.broadband (UK broadband) 1 February 27th 07 10:34 AM
BT Text on a Voipfone line? Steven Sumpter uk.telecom.voip (UK VOIP) 10 May 12th 06 01:40 AM
Plus Net: New TEXT only Usenet server Martin² uk.telecom.broadband (UK broadband) 3 November 8th 05 07:22 PM


All times are GMT +1. The time now is 07:35 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.Content Relevant URLs by vBSEO 2.4.0
Copyright ©2004-2019 BroadbanterBanter.
The comments are property of their posters.