Post by peter juulsHi vim.org,
I have used vim since version 4.x and love it, because
I am a command-line-guy. I just downloaded the brand
new vim70w32.zip and installed on my Windows 2000 pc.
BUT it has always been a mystory to me how to control
character sets used in vim, especially control the
danish characters. I have read the faqs, the
README_DOS.TXT-files etc. with no luck. Could you
please help me, give me a hint?
Files created in Notepad.exe and in DOS-programs use
different character sets. When I run a TYPE command in
a command prompt on a Notepad file, the three extra
danish characters are rubbish. And, when I open a
DOS-file in Notepad, danish characters are rubbish.
Can I switch character sets and have console vim
always display danish characters correctly, no matter
which editor created the file? That would be very
convenient.
My Windows has Regional Settings = Danish.
set nocompatible
source $VIMRUNTIME/mswin.vim
set helpfile=C:\UTIL\vim\vim70\doc\help.txt
Best regards
Peter Juuls
[advertisement snipped]
If you have some files using a Dos charset, and other ones using a
Windows charset, the way to do it is file-by-file. Here are a few
sections you should read in the help:
" 'encoding' (global) defines the way Vim internally represents the data
:help 'encoding'
" 'fileencoding' (local to buffer) defines how the file's data is
represented on disk
:help 'fileencoding'
" 'fileencodings' (global, and with s at the end) defines the
heuristics used by Vim to guess the 'fileencoding' when reading a file
:help 'fileencodings'
" 'termencoding' (global) defines how your keyboard (and, in console
Vim, your display) represents the data
:help 'termencoding'
" Modelines allow setting local options on a file-by-file basis
:help modeline
" See also how Vim names the various charsets
:help encoding-names
" and how to set the 'fileencoding' manually when reading or writing
one particular file
:help ++opt
"etc.
I don't guarantee that setting the 'fileencoding' by means of a modeline
will work, however, because to read the modeline itself, it is necessary
to read the file: chicken-and-egg problem.
Most of these options require that Vim be compiled with the +multi_byte
feature, even if you always set these options to single-byte (8-bit)
encodings. That may be strange but it is a design feature, and you
should be aware of it, or you may run into problems if you use a
-multi_byte version of Vim by mistake. To check it, use ":version" (the
answer should include +multi_byte or +multi_byte_ime, with or without
/dyn), or ":echo has('multi_byte')" which should return a nonzero value,
normally 1. For instance, in your vimrc, you could write:
if has("multi_byte")
" replace this comment by whatever is needed for Danish support
else
echoerr "This Vim version wasn't compiled with multiple-charset
support"
endif
The reason I mention 'termencoding' is that, by default, it is empty,
which means "use the value of 'encoding'". This is usually correct when
you start Vim, because the default value of 'encoding' is obtained from
your OS. But if you change 'encoding', for instance to set it to UTF-8,
which can represent any kind of text data known to man, the way your
keyboard represents your keystrokes doesn't change. Therefore, changing
'encoding' should be done using a construct similar to the following:
if &termencoding == ""
let &termencoding = &encoding
endif
set encoding=utf-8
The 'encoding' option, which is global, must be set to some value which
allows representation of all the characters used by all the files you
may be editing, either concurrently, or successively without changing
'encoding'. Depending in part on which "special" characters are included
in your Danish text, Latin1 may or may not be good enough; UTF-8 will,
at a slight expense of memory.
Now, the encoding names (for the buffer-local 'fileencoding' option).
IIUC, the names you need are probably the following:
cp850 (the "international" Dos codepage), and
cp1252 (Windows's "Western Europe" charset). There are also
latin1 (aka ISO-8859-1), the ISO charset for Western Europe defined
prior to the invention of the Euro currency, and
iso-8859-15 (aka Latin9), a charset very similar to Latin1 but which
includes the Euro sign.
The latter two are "international standard" charsets, not a property of
Bill Gates. ;-)
You can check the Dos codepage by issuing the CHCP command (with no
arguments) at the prompt in a Dos box. I'm not sure how to check the
Windows charset.
Now here is how you tell Vim a file's encoding, once 'encoding' is
already set to some "compatible" value:
:e ++enc=cp850 filename.ext
Since cp850 and cp1252 are both 8-bit encodings, it's not possible to
set the 'fileencodings' heuristics to automagically detect them both
without a modeline, because neither will, for any file, return the
"wrong charset" signal to the heuristic. This means that if you have
them both in the 'fileencodings' option, Vim will never use whichever of
them comes last. If your "most used" 8-bit charset is Windows-1252, then
you would "typically" use:
if has("multi_byte")
if &termencoding == ""
let &termencoding = &encoding
endif
set encoding=utf-8
set fileencodings=ucs-bom,utf-8,cp1252
setglobal fileencoding=cp1252
else
echoerr "ERROR: Can't handle multiple encodings! You need to
recompile Vim!"
endif
(ucs-bom and utf-8 are Unicode heuristics, and _they_ can return a
"wrong charset" signal to the charset-detecting heuristic, which then
proceeds to check the file for the next charset in the list.) This will
detect 7-bit ASCII files (files which don't contain any character higher
than 127) as being in UTF-8. This is normal: the same data is
represented identically in 7-bit ASCII, in UTF-8, and indeed in the
first half of most 8-bit ASCII encodings including the Latin1 and Latin9
encodings mentioned above.
With the above settings, you should only need to use the ++enc argument
for files which are not in your "default" charset, meaning that
:e file1.txt
would open a file in Windows-1252; and
:new ++enc=cp850 file2.txt
would split the window to open a file in cp850.
Best regards,
Tony.