How to optimize string handling in Visual Basic
6.0
String handling in Visual Basic is slow if done
the wrong way. You can add significant performance to string operations
by following a few easy tricks.
Faster strings with VB6
Visual Basic 6.0 contains a large selection of string handling
functions, such as Left, Mid, Right, Len, Asc and InStr. They offer
a powerful way operate with strings. Unfortunately, many of the
string functions are not optimized for speed. This is why VB6 apps
may run slower than necessary.
Fortunately, you can overcome many of the speed limitations by
clever coding. This article shows a number of good tricks to add
speed to string-intensive applications. The tricks use pure VB6
code. No extra run-time files or API calls are necessary.
Who should read this article
These tips are based on Visual Basic 6.0 and variable-length strings.
They're most useful with string-intensive programs that read, parse
or manipulate large amounts of text. The performance gains from
using these techniques are significant if you're executing the calls
thousands or hundreds of thousands of times. If you're just occasionally
writing and reading a few strings outside of loops, these tips won't
help you much. While the tips work best for VB6, some of them are
generic in that they also apply to earlier and later versions of
VB.
Why are the VB6 strings so slow?
Perhaps the biggest bottleneck is that VB makes copies of the string
data when doing some of the operations. Even when you're just reading
strings (and not planning to make any modifications), you can easily
end up making a large number of copies. The copying costs you time
if string processing is an intensive part of your program. Another
reason is that some of the widely used functions are implemented
in a non-straightforward way. They may be doing more work than what
is required for your task. Fortunately, you can often replace an
advanced functions with a simpler and faster alternative.
The empty string
Is the "" expression often found in your code? Beware!
So many CPU cycles are wasted for such a string! Testing and assigning
empty strings is an easy place for optimization.
Checking for empty string
TIP! It's often necessary to test for an
empty string. The usual ways are these:
If Text$ = "" Then
If Text$ <> "" Then However, VB executes
the following equivalent statements much faster.
If LenB(Text$) = 0 Then
If LenB(Text$) <> 0 Then
The replacement is essentially risk-free. Your
code executes the same as before, only faster.
VB's implementation of LenB is fast. LenB is the byte equivalent
of Len. Len is actually implemented as LenB\2. That makes LenB is
faster than Len, so you should use it where possible. VB3 and VB.NET
don't have the LenB alternative, in these languages you should use
Len.
Note that we use the <> operator, not >. <> simply
tests for inequality, while > tests more. As Len/LenB never return
a negative number, we can safely use this test.
Assigning an empty string to a variable
TIP! This is the usual way to clear a string
variable.
Text$ = "" What a waste! First of all, the string
"" takes 6 bytes of RAM each time you use it. Consider
the alternative:
Text$ = vbNullString So what is this? vbNullString is a special VB constant
that denotes a null string. The "" literal is an
empty string. There's an important difference. An empty string
is a real string. A null string is not. It is just a zero. If you
know the C language, vbNullString is the equivalent of NULL.
For most purposes, vbNullString is equivalent to "" in
VB. The only practical difference is that vbNullString is faster
to assign and process and it takes less memory.
If you call some non-VB API or component, test the calls with vbNullString
before distributing your application. The function you're calling
might not check for a NULL string, in which case it might crash.
Non-VB functions should check for NULL before processing a string
parameter. With bad luck, the particular function you're calling
does not do that. In this case, use "". Usually APIs do
support vbNullString and they can even perform better with it!
Beware of variants!
It's a simple thing but often overlooked. All variables, parameters
and functions should have a defined data type. If the data is a
string, then the data type should be defined as string. If you don't
give a data type, you're using a variant. The variant data type
has its uses but not in string processing. A variant means performance
loss in most cases.
TIP! So add those Option Explicit
statements now and Dim all variables with a decent data type. Review
your functions and ensure that they define a return data type.
Dollars that make your program run faster
TIP! The following functions unoptimal
if you're using them on strings.
Left(), Mid(), Right(), Chr(), ChrW()
UCase(), LCase(), LTrim(), RTrim(), Trim(),
Space(), String(), Format(), Hex(), Oct(),
Str(), Error These are the dreaded variant functions. They take
a variant, they return a variant. These functions are OK to use
if you're processing variants. This is the case in database programming,
where your input may contain Null values.
So what's all that variant stuff in string processing? It's fat.
If you're dealing with strings, forget about the variants. Use the
string versions instead:
Left$(), Mid$(), Right$(), Chr$(), ChrW$()
UCase$(), LCase$(), LTrim$(), RTrim$(), Trim$(),
Space$(), String$(), Format$(), Hex$(), Oct$(),
Str$(), Error$ Replace or not?
The following tip might be obvious, but it wasn't to us. It makes
no sense to call Replace if you're not likely to replace anything.
If a replace is unlikely, verify first (say, with InStrB) that there
is something you need replace.
If InStrB(Text$, ToBeReplaced$) <> 0 Then
Text$ = Replace(Text$, ToBeReplaced$, "xyz")
End If If a replace is likely or certain
to occur, there is no need to call InStrB. It will just add an extra
burden. Notice that it's not necessary to add $ in the call to Replace.
This is an exception to the $ rule, presented above.
Use the wide AscW and ChrW$
TIP! VB works in Unicode internally. Every
string is in Unicode, which it takes 2 bytes per character.
You don't have to be writing international applications to take
advantage of a couple of Unicode tricks. Consider the following
functions: Asc(), Chr$(). What's wrong with them? They
are the slower versions. If you're concerned about speed, use the
wide versions instead: AscW(), ChrW$().
AscW() is not the same as Asc(). They return different values.
ChrW$() is different from Chr$() because they take different parameter
values. However, AscW() equals Asc() and ChrW$() equals Chr$() when
working with characters from ASCII/Ansi 0 to 127.
Use constants
Built-in string constants
TIP! Instead of calling Chr$()/ChrW$()
on the following numeric values, use the predefined string constants.
They will save you from the function call.
vbNullChar 0
vbBack 8
vbTab 9
vbLf 10
vbVerticalTab 11
vbFormFeed 12
vbCr 13
vbCrLf 13+10
vbNewline 13+10
"" 34
For some reason, vbNewline is a little
bit faster than vbCrLf.
The last example ("") is not actually a constant but
an escape sequence. You can use "" anywhere in a string
to represent a quotation mark. The alternative is Chr(34), which
was required in some early BASIC versions where the ""
syntax didn't exist.
You can also define other other character values to avoid repeated
calls to Chr$()/ChrW$(). If the character value is in the range
ASCII 0-31, you need to define them as variables and assign the
correct character value before use.
Dim BEL As String
BEL = ChrW$(7) ' The BEL character, or ^G For other characters you can simply use a constant.
Const Percentage = "%" Unnecessary Asc/AscW
TIP! It's obvious, but calling Asc/AscW
on a string constant makes no sense, as the value returned never
changes. Instead of Asc("A"), use the value 65. Better
yet, define a constant such as
Const ascA = 65 and use it instead of 65 for more legibility. As it
happens, VB.NET compiles Asc("A") better, but since we're
in VB6, we need to define this constant.
Your own string constants
If the same string exists in more than one location in your project,
it will also exist in several locations in the executable file,
as far as VB6 is concerned (VB.NET joins duplicated strings during
compilation).
You can optimize by defining your strings as constants and referencing
the constant where you need the string value. This way you save
space as each constant gets stored only once. Besides, if you ever
consider localizing your program, you have a useful list of string
constants to give to the translator.
There is a nasty exception. It doesn't save any space to define
constants by other constants.
Const MSG1 = "Hello, "
Const MSG2 = "world!"
Const MSG3 = MSG1 & MSG2 In this case you will
actually the same text twice in the executable. All of MSG1, MSG2
and MSG3 will get stored - not something you wanted to achieve!
If you want to save space, concatenate MSG1 & MSG2 at run-time.
For speed, store it in a variable for reuse.
Also notice that the above applies to string constants only. Numeric
constants are also computed and stored in the executable, but string
constants are more likely to demand more space (6 bytes overhead
+ 2 bytes per character).
Store strings in .res files
When compiling to an executable file, VB stores (most) string literals
in Unicode, requiring 2 bytes per character. If you want to store
your strings 1 byte per character, use resource files instead. This
might reduce your executable size considerably if the amount of
string data is large.
Resource files are also handy for storing long strings or strings
that may be subject to localization.
Comparing strings
Comparing strings against each other may take longer than you expected.
Here are a few tricks.
Comparing the leftmost character
Here are two unoptimized ways to branch on the first character
in a string.
' Case 1
If Left$(Text$, 1) = "A" Then
' Case 2
Select Case Left$(Text$, 1)
Case "A"
Case "B"
End Select
Here are the faster
alternatives.
' Case 1
If LenB(Text$) <> 0 Then
If AscW(Text$) = 65 Then
' Case 2
If LenB(Text$) <> 0 Then
Select Case AscW(Text$)
Case 65 ' AscW("A")=65
Case 66
End Select
Calling AscW() is faster
than first calling Left$(), then comparing the result to another
string. There's a caveat, however. AscW() on an empty or null string
is a run-time error. That's why you must first call LenB()
to rule out that possibility. You can leave out the call to LenB()
only if you're certain that the string contains at least one character.
The Select Case structure offers an additional bonus. Having single
numbers in the Case conditions is less time-intensive than repeatedly
comparing against a string.
Comparing a character in the middle of a string
Similar to the above trick, this is the way to check for a character
in the middle of a string.
If AscW(Mid$(Text$, index)) = 65 Then Note that index must be less than or equal to Len(Text$),
otherwise you get a run-time error. In this case, call Mid$(,
index) instead of Mid$(, index, 1). The third
parameter (length) actually makes the call slower, although one
could assume the opposite.
Compare in binary
TIP! Whenever you can, use binary comparison.
This is VB's default. The following cases are a lot slower than
their default binary alternatives:
Option Compare Text
StrComp(, , vbTextCompare)
InStr(, , , vbTextCompare) If you're a after case-insensitive StrComp(), it may
make sense to call LCase$, if you can manage by calling it on the
one parameter only.
StrComp(Text1$, "abc", vbTextCompare) ' Slower
StrComp(LCase$(Text1$), "abc", vbBinaryCompare) ' Faster
In the following case, the
two calls to LCase$ remove the performance gain you got above.
StrComp(LCase$(Text1$), LCase$(Text2$), vbBinaryCompare) Bear in mind that StrComp(,,vbTextCompare) is more
than just a case-insensitive comparison. It's actually built for
sorting, not comparing for equality. In many cases, such a locale-dependent
textual comparison is an overkill and can even lead to subtle errors.
Check for existence with InStrB
InStr is a nice function to find a string inside another one. Normally
you use the wide-character version (plain InStr). However, there
is an optimization with the byte version (InStrB). If you are just
going to check whether a string exists inside the other but don't
care about the location, use the following code:
If InStrB(Text$, SearchFor$) <> 0 Then Since you only compare the return value against
zero, you don't need to worry about conversions between byte-based
indices and character indices. This is not the whole story, however.
You need to be aware of the following catches:
What does this mean? Use InStrB to optimize only when you fully
understand how it works.
Like
The Like operator is not particularly fast. Consider alternatives.
We don't have a generic rule to follow here. You need to measure
the performance differences between your alternatives. Here is one
rule though. It applies if you're looking for a certain string inside
another one.
If Text$ Like "*abc*" Then -> If InStrB(Text$, "abc") <> 0 Then Read above for
more discussion of InStrB before using this approach. You may also
use InStr.
String parameters
Procedure string parameters differ from numeric parameters in that
with strings, the chosen parameter passing convention makes a real
performance difference.
Pass strings ByRef
How should you define procedure parameters for calls from within
the same project?
ByVal is slow for string parameters. ByVal makes a copy of the
string on every call. The good side is that a ByVal parameter is
safe to modify: the modifications aren't passed back to the callers.
ByRef is faster because the string doesn't get copied. The drawback
is that you have to be careful. If your intention is not to return
a value in the ByRef parameter back to the caller, you may not accidentally
write to this parameter.
Use ByRef instead of function return value
There's also an optimization trick for returning a string value.
Returning a string as the function return value is the normal practise.
However, returning a string in a ByRef parameter is faster.
The ByRef trick for return values applies to both functions and
Property Get's. Here's the usual (and slower) way.
Property Get Name() As String ' Alternative: Function Name() As String
Name = m_sName
End Property
This way is faster if you have to make a large number
of calls.
Sub GetName(ByRef Name_out As String)
Name_out = m_sName
End Sub
It's often considered bad programming style to return
values in parameters. Normally procedures should not cause side-effects
by modifying their ByRef parameters. However, if you want speed,
you sometimes have to reject accepted programming practises to win
a few CPU cycles. Thus, the optimization objective might justify
the loss of style. You can use ByRef, but you should indicate why
you're using it. For example, you can mark all output parameters
with the string out, or write a comment saying ByRef is
used for speed.
Stick to ByVal for out-of-process calls
There is one case where ByRef is slower than ByVal. This happens
when passing ByRef to an out-of-process server. The variable has
to be marshalled twice, once going into the method and once returning.
The implication is to use ByVal for your public server interfaces.
The dollar sign
TIP! In this article we've used the $ sign
to denote a string variable. This is actually an obsolete way. We
don't recommend the $ sign for string variables. It's used here
only for the sake of clarity. In real code, you should define your
variables with code such as Dim
Text As String.
|