We're encouraged to believe that CRT data conversion functions are going to execute faster than anything we might write by hand. Unfortunately, this is not always the case. It certainly isn't the case for wcstoul, which converts a Unicode string to an unsigned long integer.
Granted, wcstoul does a lot. The documentation describes some of the stuff that it supports. And when you're using the CRT from the multi-threaded DLL, you've got to expect a little overhead. But I was calling wcstoul inside a tight loop, and was surprised to find that a simple homegrown converter took 5% of the time that wcstoul takes. That's right – not 5% less time, i.e., slightly faster, but 95% less time, i.e., unbelievably faster.
Of course, I determined the bottleneck by profiling the code, so the real message here is that you don't know what needs to be optimized, and you don't know if your optimization is actually helping, until you measure.
In case you're curious, here's the code, with radix support removed for simplicity:
template <typename TChar, typename TInteger>
inline bool ConvertDigitToInteger(TChar chDigit, TInteger & rnInteger)
{
if (chDigit < static_cast<TChar>('0') ||
chDigit > static_cast<TChar>('9'))
return false;
rnInteger = rnInteger * 10 +
static_cast<TInteger>(chDigit) - static_cast<TChar>('0');
return true;
}
template <typename TInteger, typename TInputIter>
inline TInteger ConvertDigitsToInteger(TInputIter itInput,
TInputIter itInputEnd, TInputIter * pitInputStop = NULL)
{
TInteger nResult = 0;
while (itInput != itInputEnd &&
ConvertDigitToInteger(*itInput, nResult))
++itInput;
if (pitInputStop != NULL)
*pitInputStop = itInput;
return nResult;
}