Please Disable Ad-Block To View This Website.

If you block ads, this site can not survive!

Ads are very minimal for registered users. If you don't have an account please register now!

DOWNLOAD
 Full Scripts
 Addons
 Snippets
 DLLs
 MTS Themes
 Tutorials
 Misc.
 File Queue
 Download mIRC
INTERACT
 Screenshots
 Challenge
 Top Downloads
 Submit Form
 Forums

SEARCH
Site Search

FRIENDS
Link to us!
PhotoShelf

Home | Comments:
Average Rating:   9.8   $htmlconv() by FiberOPtics
Description:
Snippet to strip html tags from a string, as well as convert any html entity that IE can process. Includes a bonus snippet which can convert an entire file at once.

Submitted Review Author's Updates

There is no review for this file yet.
There are no update notes.
Screenshot:
No
Screenshot
Available

Comments:

  Mode:    Create New Post

SnarxAug 3, 2007 2:26PM
There's an error.
&ecute should be é and &Ecute should be É in the htmlconv alias

macrobodyOct 22, 2006 5:23PM
Rating:     9First of all i want to say that the htmlconv program is really good and fast.

One question though:


After the conversion the html lines are concatenated without space and if the code is like <font>abcd</font><font>efgh</font> it ends up as abcdefgh while I need abcd efgh

Is there a way to fix this?

FiberOPticsOct 22, 2006 5:32PM
The fact there is no space in there is normal, there are no spaces in your example, so the snippet should not arbitrarily insert any either. In other words, there is no need for a "fix" because the snippet isn't broken.

You could make a change to the snippet like this:

Change: var %t, %u = $regsub($replace($1,<br>,$crlf),/^[^<]*>|<[^>]*>|<[^>]*$/g,,%t)

to: var %t, %u = $regsub($replace($1,<br>,$crlf),/^[^<]*>|<[^>]*>|<[^>]*$/g,$chr(32),%t)

Instead of replacing html tags with $null (nothing) it will replace them with a space because of $chr(32).

Text edited by author on Oct 22, 2006 @ 5:34PM


macrobodyOct 24, 2006 11:10AM
This would be a change in the alias htmlconv but i don't know how to change it in alias _htmlconv and this is what i am using because it can convert a whole html file at once within a second. So the text file i now end up with is without alot of spaces that i need. There is no way afterwards to determine where the space should be. I know it is because all html code is removed and that there are no spaces but maybe you have an idea how i can implement it in _htmlconv.

Cheers.

FiberOPticsOct 24, 2006 12:38PM
Since _htmlconv indeed converts an entire html file at once, there is no way for me to intervene, it doesn't look at the html one by one, it takes the entire file, and interprets it as if it were a browser displaying the code.

Why don't you just add a space to the file at the right place before using _htmlconv? Seems like an easy fix, no? ;)

mrunoFeb 25, 2006 9:18AM
Rating:     10very nice. i do not see any pause in my mIRC at all. keep up the good work!

mygganFeb 5, 2006 11:06AM
What purpose does the COM object serve in $htmlconv? Executing the alias for the first time froze my mirc for 1.5-2 seconds, not running any resourceheavy applications.

FiberOPticsFeb 5, 2006 12:45PM
As stated in the documentation, the COM is there to translate any possible html entity that IE recognises. It's either that, or include a shitload of hardcoded html entities in a $replace, similar to the relatively short one that I included in the code. The first version of the snippet had only the COM, but I found it too slow, therefore the custom mIRC coding.

I am only repeating what I have already explained in the documentation btw...

COM freezing the first time you issue it is not uncommon (any WMI script suffers from it), it's like that in many many cases possibly due to mIRC, but after the very first time it should be fine.

Although, I never get a freeze even at first try with this snippet (I just tested now on a fresh copy of mIRC, after a reboot, so no COM has yet been initiated anywhere on the system). Is Norton running on your system?

I may include another property in a future update which specifies that no COM may be used. Then you still have the stripping of tags, and translation of common entities, but the remaining entities will either have to be stripped or left in. Another possibility is that I change decimal entities and hex entities like & #44; with $regex, although then, mIRC would change & #38;& #97;& #109;& #112;& #059; (I put spaces on purpose) to & amp;, which the COM code would change to &. That's not what we wanted, but I suppose those occasions will be really rare. For the time being I will let it be as it is, and wait for some more responses.

Anyway, you are the first person for whom it freezes, quite a few people have tested it for me before release, and none have that problem, so I'm assuming you have some kind of system monitoring program that tripped at first execution, as there's no reason at all for it to freeze.

Text edited by author on Feb 5, 2006 @ 1:07PM


mygganFeb 5, 2006 2:22PM
I have no monitoring applications running, but how it occured is kind of irrelevant (as you said yourself, it's a common thing).

I do think if you *can* do it without COM (unless it improves performance), you should do so.
I have my own alias which parses decimal/hex entities correctly, as well as hardcoded named entities. All without COM and with no speed issues. If you want to take a peek, drop me a PM.

Nice job anyhow

EDIT: Just did a speed test comparing our snippets.

//var %i = 1, %t = $ticks | while %i <= 100 { !.echo -q $htmlconv(& #33;&# x8f;<a>moo & nbsp; & amp; </a>) | inc %i } | echo -a result: $calc($ticks - %t) ms

Average results on your was 1735 ms, while mine was around 219 ms which is a big difference.

Note that this is not a brag-post, I just want to point out that COM is a slow option for this particular purpose.

Text edited by author on Feb 5, 2006 @ 2:41PM


FiberOPticsFeb 5, 2006 2:50PM
Like said, there is no problem for me to hardcode every single html entity (it's simple $replace/$regex/$regsub), HOWEVER if you want to be complete, it will make the alias 4 times as long, and then no one is going to want to use it. Your alias is never going to be as complete as mine. Don't forget that my purpose with this snippet is both speed AND completeness, in yet relatively compact code.

Your results are obvious, but biased, because you are always forcing the COM to be used, whereas in normal situations, the COM code will only be used if mIRC didn't replace it already, therefore you should bench on variable strings, not always the same string. It's a battle that $htmlconv can't win with your test string, because COM must be called each time. In real time situations, when processing html line by line, you will notice that the COM is called less than 5% of the time.

I wanted to be different with this alias, compared to others, in the sense that this will convert any html entity that IE can convert, not just a hardcoded set of common ones. In case you don't know, there are literally hundreds of them (I am not talking about decimal/hex ones, I mean ones like & quot;), which would mean you need to include atleast 10 $replace identifiers who are all at the string too long limit. All in the assumption of course if, like me, you want to be complete.

It's hard to please everyone, I know, but I don't think people want to include a html converter which is longer than their socket code to begin with. With this alias, I think I'm holding a nice middle ground. Like I already stated, the COM is there for completeness sake, take a random html page (heck, take this one) and you will see, it will probaby not even contain one line that the COM needs to process. Therefore, it's wrong to stare blindly at that COM code, or to show me benches, of code that you don't even need to bench. I don't need to bench that code to tell that it goes faster without the COM, I _know_ it goes faster with $replace, that is obvious. Look at the bigger picture.

I like to think of the COM as a rescue boat, it's not utterly important that the boat goes slow, because emergency situations don't arise often, so the only thing that really matters is that you're saved. You get the point.

There, I got nothing further to add.

Text edited by author on Feb 5, 2006 @ 3:29PM


mygganFeb 5, 2006 3:52PM
Point taken

Ghost2Feb 5, 2006 2:36PM
Rating:     10Good work man. I was looking for a script like this a few days ago and was hoping I wouldn't have to write the conversion tables

mafixFeb 4, 2006 8:56AM
Rating:     10Nice, very nice.

seamasterFeb 4, 2006 7:07AM
Rating:     9Good work! The code is very well done. []

MpdreamzFeb 4, 2006 6:02AM
Rating:     10Nice work :P

zzattackFeb 4, 2006 4:53AM
Rating:     10best ever

jayteaFeb 4, 2006 12:03AM
Rating:     10if this script was a car battery i would WITHOUT QUESTION hook myself up to it via those enormous crocodile clips clasped firmly around my nipples. you are indeed a man's man :*

FiberOPticsFeb 4, 2006 4:29AM
ROFL



Create New Post

You must be logged in to post messages.