vb string functions speed vs winapi

vb string functions speed vs winapi

Anonymous
Not applicable
636 Views
12 Replies
Message 1 of 13

vb string functions speed vs winapi

Anonymous
Not applicable
i've been working on a vb routine to read textfiles and parse strings for
some time now. I have it 'almost nearly working'.

I've heard that vb string functions are inherently slow and the total time
for reading 100+ files, examining each line, parsing, compiling substrings,
and entering results in collections is running up to 90 seconds duration to
complete.

I'm wondering if it would be worth the time to rewrite most or all of my
string functions from vb to winapi. in hopes of speeding things up.
also wondering if some funcs would be likely to yield greater efficiency
improvement than others, eg lstrcomp, strpos etc??
anyone have any insight into this idea?

i suspect my slowest function is adding a string to a collection, inserting
it in alphabetical order. I probably should rewrite that to add
sequentially and then alphabetize when its all done.

i'll have to search through the api to see if there are functions for
filling collections..
maybe simulate a collection with copymemory or fillmemory? but i don't see a
read memory so i don't know how i'd read it back after filling?

any inspirations appreciated
tia
mark
0 Likes
637 Views
12 Replies
Replies (12)
Message 2 of 13

Anonymous
Not applicable
Probably could offer some more specific suggestions if you were to provide a bit more info on what you're trying to do. However my thoughts at the moment are:
  
1. Sometimes you can get a performance improvement if you pull an *entire* file into a string variable in one read and then processing the lines from memory rather than disk. Possibly using the Split function with a vbCrLf separator.
2. Off the top of my head CopyMemory is an api that can specifically help with string processing.
3. I would agree that doing an insertion sort on a collection is likely slower than doing a sort after the collection is complete. But with the caveat that depending on what you're trying to do a collection may not be the right thing to be using in the first place!
4. http://www.vb2themax.com is an excellent resource for string handling and sorting. Have a good look around the Code Bank, the Optimization Bank and the Article Bank, but in particular see the article "Play VB's Strings".
  
Regards
  
Wayne Ivory
IT Analyst Programmer
Wespine Industries Pty Ltd
0 Likes
Message 3 of 13

Anonymous
Not applicable
Thanks wayne,

"wivory" wrote in message
news:f198bca.0@WebX.maYIadrTaRb...
> Probably could offer some more specific suggestions if you were to provide
a bit more info on what you're trying to do.

I'm reading all my lisp source text files, getting all function names,
argument lists, putting into a master file for indexing and searching
purposes. I have so many functions i sometimes forget the exact number or
order of args and in which file it is defined or even the exact name of the
function. So this vb function reads all the files, finds the function
elements, puts them into collections, then writes the collections to text
files...at this point.
later i'd like to be able to say something like "PasteFunction" and have it
find the function and arglist, copy to clipboard and i could then paste it
into whatever text editor i was in, ususally vlide, kind of like
intellisense works in vb where it shows you what args it's looking for when
you are calling a function....

the tricky part is that since in lisp, a line can be divided over more than
one line, the function name and arglist could be on one line or spread over
many, or even more than one function could be defined on one line, so the
parsing algorithm has gotten a wee bit complicated, trying to allow for all
possibilities. I'm probably not being too intelligent about how i've
written that part either but it's so convoluted I doubt if anyone would want
to take time to look at it with an eye to possible improvement...

 However my thoughts at the moment are:
>
> 1. Sometimes you can get a performance improvement if you pull an *entire*
file into a string variable in one read and then processing the lines from
memory rather than disk.

I'm reading the file into a collection at once, then processing the
collection items. using collection rather than array since they're so easy
to work with. I'm not aware of a speed hit by using collection vs array,
are you? I also dont' think there is any time issue at all with the actual
reading of the file, i think the time is being consumed in the processing of
each line looking for various elements, later on in the program.

> 2. Off the top of my head CopyMemory is an api that can specifically help
with string processing.
That sounded like a likely candidate, will have to study after re-studying
the Balena article
> 3. I would agree that doing an insertion sort on a collection is likely
slower than doing a sort after the collection is complete. But with the
caveat that depending on what you're trying to do a collection may not be
the right thing to be using in the first place!

in this case, the collection seems to me to be useful. this is where i'm
storing the results of my search. I just think my method of inserting is
ill-conceived. will try the alternate method.(sorting afterward)

> 4. http://www.vb2themax.com is an excellent resource for string handling
and sorting. Have a good look around the Code Bank, the Optimization Bank
and the Article Bank, but in particular see the article "Play VB's Strings".

I remember reading that article some time ago, probably where I heard about
the vb inefficiency at string handling, but as with so many things, i
obviously didn't memorize or totally absorb it all...time to go back and
study that one.

Thanks as always for your inspirations.
Mark
0 Likes
Message 4 of 13

Anonymous
Not applicable
Consider using Regular Expressions to parse the lisp functions. Set a
project reference to Microsoft VBScript Regular Expressions found in
vbscript.dll.
Check out
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/vtoriregularexpressionsobjectpropmeth.asp
and
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/vspropattern.asp.
Also look at http://www.vb2themax.com/SearchWord.asp?Search=regexp for
ideas. The key will probably to use non-greedy matching done with "?".
0 Likes
Message 5 of 13

Anonymous
Not applicable
I'll check those out, thanks!
Mark

"ljb" <.> wrote in message
news:EE73852DC7797CC154544E8A519D788F@in.WebX.maYIadrTaRb...
> Consider using Regular Expressions to parse the lisp functions. Set a
> project reference to Microsoft VBScript Regular Expressions found in
> vbscript.dll.
> Check out
>
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/vtoriregularexpressionsobjectpropmeth.asp
> and
>
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script56/html/vspropattern.asp.
> Also look at http://www.vb2themax.com/SearchWord.asp?Search=regexp for
> ideas. The key will probably to use non-greedy matching done with "?".
>
>
0 Likes
Message 6 of 13

Anonymous
Not applicable
Here is a little vbs that will demo the Regular Expression. Save the
following lines as xxx.vbs, double click the file and see what you get. Take
out the "?" characters in the pattern and see the difference.

mylisp = "(defun my_function1 (temp)...(defun my_function2 (temp1
temp2)....)"

set re = createobject("VBScript.RegExp")
re.IgnoreCase = true
re.Global = true
re.pattern = "\(defun\b.*?\(.*?\)"

for each match in re.execute(mylisp)
wscript.echo match
next
0 Likes
Message 7 of 13

Anonymous
Not applicable
under the gun today but i'll look into that asap
Thanks again,
Mark

"ljb" <.> wrote in message
news:7B1514DC78EB0D9A6DA6D4278014BDEA@in.WebX.maYIadrTaRb...
> Here is a little vbs that will demo the Regular Expression. Save the
> following lines as xxx.vbs, double click the file and see what you get.
Take
> out the "?" characters in the pattern and see the difference.
>
> mylisp = "(defun my_function1 (temp)...(defun my_function2 (temp1
> temp2)....)"
>
> set re = createobject("VBScript.RegExp")
> re.IgnoreCase = true
> re.Global = true
> re.pattern = "\(defun\b.*?\(.*?\)"
>
> for each match in re.execute(mylisp)
> wscript.echo match
> next
>
>
0 Likes
Message 8 of 13

Anonymous
Not applicable
I couldn't resist since I think Regular Expressions are so cool. This is a
slight variation of the previous code I posted. "Defun" is now part of a
non-capturing match. This example reads all of acad2000doc.lsp into a string
variable and list all the defun statements.

------------- demo.vbs -----------------

mylisp = CreateObject("Scripting.FileSystemObject") _
.OpenTextFile("C:\ACAD2000\SUPPORT\acad2000doc.lsp", 1) _
.ReadAll

set re = CreateObject("VBScript.RegExp")
re.IgnoreCase = true
re.Global = true
re.pattern = "(?:\bdefun\b)(.*?\(.*?\))"

for each match in re.execute(mylisp)
wscript.echo match.submatches(0)
next
0 Likes
Message 9 of 13

Anonymous
Not applicable
wow dude, that is amazing.
on several levels...

I had no idea you could do this: (creating and using objects on the fly
without declaring and setting variables)
> mylisp = CreateObject("Scripting.FileSystemObject") _
> .OpenTextFile("C:\ACAD2000\SUPPORT\acad2000doc.lsp", 1) _
> .ReadAll
I thought I had to declare and set variables to the class fso, then another
to the textstream object, then call .readall on that object to get the
string

a lingering question though is how to close the file without a textstream
object? ... or was this just a snip to show the regexp usage?
or does that line somehow automatically close that file by itself???

the other amazing thing is this sequence:
> for each match in re.execute(mylisp)
> wscript.echo match.submatches(0)
> next
if I cut and paste your code into a vb module without Option Explicit
and put it in a sub, it runs (amazingly quick too).
(I substituted debug.print for wscript.echo)

the part that amazes me is that since match is not declared and is therefore
a variant, I see that in the loop it is set to each match object in the
matches collection object returned by .execute.

but since submatches is also an undeclared variant I don't see how it works
as a property or method on the match object (which doesn't have a submatches
property)(FirstIndex, Length, Value)

and further I don't understand the (0) after submatches, as if it were an
array or collection indexed item.(which it isn't, it's supposedly just an
object with 3 properties)

but since the default property of the match object is .Value, i could see
how:
"wscript.echo match" might work, but
"wscript.echo match.submatches(0)" completely baffles me!

but it works! that's what I cant' figure out.

If however I add option explicit and declare the variables properly, it
doesn't work!
what's going on there?

if i didn't have your example, and I was just going by the help files on
regexp's I would have written that:
Dim mylisp As String
Dim re As VBScript_RegExp_10.RegExp
Dim colMatches As VBScript_RegExp_10.MatchCollection
Dim oMatch As VBScript_RegExp_10.Match
....read file....
Set colMatches = re.Execute(mylisp)
For Each oMatch In colMatches
Debug.Print oMatch.Value (or just Debug.Print oMatch)
Next
Set omatch = nothing
set colmatches = nothing
set re = nothing

and in fact with option explicit on that syntax does also work.

is the first syntax something coming from vbscript itself?
I know nothing about vbscript and don't even understand how one can mix
languages in the same program and have it work! (except in the case of
setting a reference and then using objects from the library)

so anyway, thanks for upping the learning curve for me...
lots of studying to do on those reg expressions, I can see they're a
powerful tool but at first glance the syntax is a bit confusing....
just need to re-read the help file several more times to get it to sink in.

Thanks again,
this has been a real eye-opener!
Mark

"ljb" <.> wrote in message
news:4D102A3989848627E5533159F219CFD3@in.WebX.maYIadrTaRb...
> I couldn't resist since I think Regular Expressions are so cool. This is a
> slight variation of the previous code I posted.>
0 Likes
Message 10 of 13

Anonymous
Not applicable
I often use objects this way however it can make the code less readable and
is discouraged by some. My understanding is that an object variable contains
a pointer to the object. If this variable is never created and a reference
set to the object, garbage collection happens as soon as the object
finishes. I believe destroying a file object always closes the file but we
weren't writing to it so nothing was lost.

The submatches collection only became available with vbscript 5.5. I think
you are explicitly declaring references to vbscript 1.0 therefore it doesn't
work. Vbscript 5.5 can be downloaded standalone or installed by IE5.5. I
suspect you already have 5.5 or 5.6 on your PC and it is used by default
when you don't declare a specific version.

The submatches object is a collection of strings and every item in a
collection has an index starting at 0. Although I'm making two submatches
I'm discarding the first one by using "?:".

enjoy
LJB
0 Likes
Message 11 of 13

Anonymous
Not applicable
I think if you got really creative you could write a match pattern that
retrieved the function name in submatch(0) and each of the parameters in
submatch(1)...(2)... Another option might be to use split(submatch(0)," ")
or something to get each parameter.
0 Likes
Message 12 of 13

Anonymous
Not applicable
Hi ljb,

"ljb" <.> wrote in message
news:002C9870614E976D669C33E689B19710@in.WebX.maYIadrTaRb...
> I often use objects this way however it can make the code less readable
and
> is discouraged by some. My understanding is that an object variable
contains
> a pointer to the object. If this variable is never created and a reference
> set to the object, garbage collection happens as soon as the object
> finishes. I believe destroying a file object always closes the file but we
> weren't writing to it so nothing was lost.

I wasn't aware of that. My concern wasn't with losing data since as you
say, we're not writing to the file. I was just concerned with having open
file handles left laying about??? If what you say is true then no problem.
>
> The submatches collection only became available with vbscript 5.5.


I think
> you are explicitly declaring references to vbscript 1.0 therefore it
doesn't
> work.

Yes, I saw both of them in the references list but didn't know which one to
use. i guess I can just use the newer one now and dispense with 1.0?

> The submatches object is a collection of strings and every item in a
> collection has an index starting at 0.

that's interesting I thought collections were 1 based indexes?
I thought colCollection(1) = first item in collection - as opposed to arrays
where aArray(0) = first item (depending on how it was declared)

thanks again for some thought provoking information.
mark
0 Likes
Message 13 of 13

Anonymous
Not applicable
and yet another cool idea from the desk of ljb!!!
Thanks
Mark

"ljb" <.> wrote in message
news:41757E645A0397E5B400C6F5D959E75A@in.WebX.maYIadrTaRb...
> I think if you got really creative you could write a match pattern that
> retrieved the function name in submatch(0) and each of the parameters in
> submatch(1)...(2)... Another option might be to use split(submatch(0)," ")
> or something to get each parameter.
>
>
0 Likes