Monday, May 2, 2011

Extract href link item from the html source matlab

Extract href link item from the html source by using matlab.
The result is that the link is saved in a txt file.


%%
% Search for number of string matches per line.
% replace "href" tag by "ctrl H + manual line break".. in doc ..
% makes life easy
home
clc
filename = 'textsrc.txt';
literal = ' <a href="';

fid = fopen(filename, 'rt');
bbase = 'dsave2'
fid_sh = fopen([bbase '.txt'],'w');

y = 0;
jj = 1;
while feof(fid) == 0
tline = fgetl(fid);

matches = findstr(tline, literal);
num = length(matches);
if num > 0
y = y + num;
% fprintf('%s\n',tline);
fprintf(fid_sh, '%s \n', tline);
end
jj = jj+1;
end
fclose(fid);
fclose(fid_sh)
% this file generates the output on the screen which needs to be copied and
% saved as
% dsave2.txt
% Which will be further operated by refinestr.m

MATLAB for Engineers (2nd Edition)MATLAB Primer, Eighth EditionDigital Signal Processing Using MATLAB

No comments: