Batch file conversion - character set and BOM detection of html files

Sample for ScriptUtils.ByteArray.HexString | Changes | Purchase | Download


     This sample can do batch conversion of text files with different code pages - Unicode, utf-8, windows-1250 and others to one selected code page. The algorithm contains simple detection of source file code page using BOM.
     You can choose any destination charset. See also ByteArray - save unicode data (string) as utf-8 with BOM to save files with BOM (unicode Little/Big, utf-8)

Batch file conversion - character set and BOM detection of html files 
Const DestCharSet = "utf-8"
'Const DestCharSet = "ascii"
Dim FS
Set fs = CreateObject("Scripting.FileSystemObject")

ConvertFolder "f:\", "f:\1"

Function ConvertFolder(byval InputPath, OutputPath) 
  Dim InputFolder, File
  Set InputFolder = fs.GetFolder(InputPath)

  For Each File In InputFolder.Files
    If LCase(Right(File.Name,4)) = ".htm" Then
      Wscript.Echo File.Path
      'wscript.echo OutputPath & "\" & replace(file.path,":","")
      ConvertFile File.Path, OutputPath & "\" & file.Name, DestCharSet
    End If

  Dim FilesFolder
  For Each FilesFolder In InputFolder.SubFolders
    ConvertFolder FilesFolder.Path, OutputPath
End Function

Sub ConvertFile(SourceFileName, DestFileName, DestCharSet)
  'read the source file contents
  Dim FileContents
  Set FileContents = ReadOneFile(SourceFileName)

  'Convert to the destination charset
  Set FileContents = FileContents.CharSetConvert(DestCharSet)

  'Save to a destination file
  FileContents.SaveAs DestFileName 
End Sub 

Function ReadOneFile(FileName)
  Dim ByteArray
  Set ByteArray = CreateObject("ScriptUtils.ByteArray")

  'Read first two bytes from the file
  ByteArray.ReadFrom FileName,,2

  Select Case ByteArray.HexString
    'unicode big endian
    Case "FEFF": 
      ByteArray.CharSet = "unicodebig"
      'Read the file from 3rd byte to end.
      ByteArray.ReadFrom FileName,3
    'unicode little endian      
    Case "FFFE": 
      ByteArray.CharSet = "unicodelittle"
      'Read the file from 3rd byte to end.
      ByteArray.ReadFrom FileName,3
    Case Else: 
      'Read first three bytes from the file
      ByteArray.ReadFrom FileName,,3
      If ByteArray.HexString = "EFBBBF" Then 'unicode utf-8
        'read a file contents behind the BOM header 
        ByteArray.ReadFrom FileName,4
        ByteArray.CharSet = "utf-8"
        'read whole contents of the file in other cases
        ByteArray.ReadFrom FileName
        On Error Resume Next
        'try to detect charset from the data source'
        ByteArray.CharSet = DetectCharSet(ByteArray.String)
        'Set some default charset (default is OEM)
        'if err<>0 then ByteArray.CharSet = "windows-1250"
      End If 
  End Select
  Set ReadOneFile = ByteArray
End Function

'The Function detects charset from the source string data.
Function DetectCharSet(Data)
  On Error Resume Next
  Dim charset
  'the charset tag usually look like
  '<meta http-equiv="Content-Type" content="text/html; charset=windows-1250">
  charset = Split(Data, "charset=", 2, vbTextCompare)(1)
  If Len(charset)>0 Then
    charset = Split(charset, """", 2, vbTextCompare)(0)
  End If
  DetectCharSet = charset 
End Function

Other links for the Batch file conversion - character set and BOM detection of html files sample


Works with safearray binary data - save/restore binary data from/to a disk, convert to a string/hexstring, codepage/charset conversions, Base64 conversion, etc.
     ByteArray is a COM class specially designed to work with Microsoft Windows Scripting engines - VB Script and JScript in Active Server Pages or WSH and in CHM or HTA applications. It also works with VB Net, Visual basic (VBA - VB 5, VB 6, Word, Excel, Access, ), C#, J#, C++, ASP, ASP.Net, Delphi and with T-SQL OLE functions - see Use ByteArray object article. You can also use the object in other programming environments with COM support, such is PowerBuilder.
     Source code for ByteArray is available within distribution license, please see License page for ASP file upload and ScriptUtilities.


Huge ASP upload is easy to use, hi-performance ASP file upload component with progress bar indicator. This component lets you upload multiple files with size up to 4GB to a disk or a database along with another form fields. Huge ASP file upload is a most featured upload component on a market with competitive price and a great performance . The software has also a free version of asp upload with progress, called Pure asp upload , written in plain VBS, without components (so you do not need to install anything on server). This installation package contains also ScriptUtilities library. Script Utilities lets you create hi-performance log files , works with binary data , you can download multiple files with zip/arj compression, work with INI files and much more with the ASP utility.

© 1996 - 2011 Antonin Foller, Motobit Software | About, Contacts | e-mail:

Other Motobit links:   IISTracer, real-time IIS monitor   ASP file upload - upload files to ASP. 
ActiveX/VBSScript registry editor  ActiveX NT User account manager  Export MDB/DBF from ASP Active LogFile  Email export  ActiveX/ASP Scripting Dictionary object