Discussion:
Unicode chars NEL, FF, LS, PS
Steve Hall
2006-09-29 13:01:10 UTC
Permalink
Does anyone here know if Vim respects the following Unicode characters
(represents them rather than just indicating literals):

http://en.wikipedia.org/wiki/Newline#Unicode

I'm not on a Unicode platform at the moment, but I'm wondering if Vim
could ever have the &listchars to do it like mined:

http://towo.net/mined/mined-uni.png


--
Steve Hall [ digitect dancingpaper com ]
Pádraig Brady
2006-09-29 13:24:11 UTC
Permalink
Steve Hall wrote:
> Does anyone here know if Vim respects the following Unicode characters
> (represents them rather than just indicating literals):
>
> http://en.wikipedia.org/wiki/Newline#Unicode
>
> I'm not on a Unicode platform at the moment, but I'm wondering if Vim
> could ever have the &listchars to do it like mined:
>
> http://towo.net/mined/mined-uni.png
>
>

On [g]vim-7.0.17 on ubuntu-5.10 those "newline" chars
are not treated specially.

For LS and PS is displays then as a single space.
For NEL it shows <85>
For FF it shows ^L

listchars doesn't support these either.

I notice gedit displays newlines for PS and LS.

Pádraig.
A.J.Mechelynck
2006-09-29 23:14:38 UTC
Permalink
Steve Hall wrote:
> Does anyone here know if Vim respects the following Unicode characters
> (represents them rather than just indicating literals):
>
> http://en.wikipedia.org/wiki/Newline#Unicode
>
> I'm not on a Unicode platform at the moment, but I'm wondering if Vim
> could ever have the &listchars to do it like mined:
>
> http://towo.net/mined/mined-uni.png
>
>

Vim is a text editor, not a word processor. It does not necessarily show
control characters as a word processor or a printer would. Even on a
non-Unicode platform, you should be able to run a +multibyte version of gvim,
set 'encoding' to UTF-8 while preserving the "locale" setting of 'encoding' in
'termencoding', and enter the characters according to ":help i_CTRL-V_digit"
to see what happens.

NEL (Next Line, 0x85) is an upper-ASCII control character. I expect Vim to
represent it as <85> when 'encoding' is set to UTF-8. This, however, depends
on the setting of the 'isprint' option. I don't know what this control
character means.

FF (Form Feed, 0x0C) is an ASCII control character; it should be represented
as ^L in Unicode just as in Latin1. When sent to a printer, it usually causes
a page eject.

LS (Line Separator, L SEP, U+2028) and PS (Paragraph Separator, P SEP, U+2029)
are "Format characters" according to Unicode
http://www.unicode.org/charts/PDF/U2000.pdf . They are followed in the charts
by "Left-to-Right Embedding", "Right-to-Left Embedding", "Pop Directional
Formatting" etc. I don't expect Vim to handle them otherwise than any other
character, i.e., fetch a glyph, if any (probably none) from your 'guifont'. In
my Gnome2 gvim with 'encoding' set to UTF-8, both U+2028 and U+2029 display as
single-width spaces.


Best regards,
Tony.
Steve Hall
2006-09-30 01:04:10 UTC
Permalink
On Sat, 2006-09-30 at 01:14 +0200, A.J.Mechelynck wrote:
> Steve Hall wrote:
> > Does anyone here know if Vim respects the following Unicode
> > characters (represents them rather than just indicating literals):
> >
> > http://en.wikipedia.org/wiki/Newline#Unicode
> >
> > I'm not on a Unicode platform at the moment, but I'm wondering if
> > Vim could ever have the &listchars to do it like mined:
> >
> > http://towo.net/mined/mined-uni.png
>
> Vim is a text editor, not a word processor. It does not necessarily
> show control characters as a word processor or a printer would.

However you might alternatively say that these floodgates were opened
when &list was invented. :)

> Even on a non-Unicode platform, you should be able to run a
> +multibyte version of gvim, set 'encoding' to UTF-8 while preserving
> the "locale" setting of 'encoding' in 'termencoding', and enter the
> characters according to ":help i_CTRL-V_digit" to see what happens.

Sometimes there's a font limitation, and I don't always trust what I
see.

> NEL (Next Line, 0x85) is an upper-ASCII control character. I expect
> Vim to represent it as <85> when 'encoding' is set to UTF-8. This,
> however, depends on the setting of the 'isprint' option. I don't
> know what this control character means.
>
> FF (Form Feed, 0x0C) is an ASCII control character; it should be
> represented as ^L in Unicode just as in Latin1. When sent to a
> printer, it usually causes a page eject.
>
> LS (Line Separator, L SEP, U+2028) and PS (Paragraph Separator, P
> SEP, U+2029) are "Format characters" according to Unicode
> http://www.unicode.org/charts/PDF/U2000.pdf . They are followed in
> the charts by "Left-to-Right Embedding", "Right-to-Left Embedding",
> "Pop Directional Formatting" etc. I don't expect Vim to handle them
> otherwise than any other character, i.e., fetch a glyph, if any
> (probably none) from your 'guifont'. In my Gnome2 gvim with
> 'encoding' set to UTF-8, both U+2028 and U+2029 display as
> single-width spaces.

It would be a lot to ask of any text editor to respect these new
Unicode formatting characters. But I do think the authors of the spec
intended these to be additions to the traditional CR and LF. I've been
involved in a "why can't Vim do X, editor Y can do it" discussion, so
my interest here is not actually using these chars myself. But there
are likely some cases where they will be useful, more and more as
software adopts Unicode. I'd personally only care that &listchars has
an option for them, on screen they act the same as any other line
ending or tab char.


--
Steve Hall [ digitect dancingpaper com ]
A.J.Mechelynck
2006-09-30 01:15:58 UTC
Permalink
Steve Hall wrote:
[...]
> It would be a lot to ask of any text editor to respect these new
> Unicode formatting characters. But I do think the authors of the spec
> intended these to be additions to the traditional CR and LF. I've been
> involved in a "why can't Vim do X, editor Y can do it" discussion, so
> my interest here is not actually using these chars myself. But there
> are likely some cases where they will be useful, more and more as
> software adopts Unicode. I'd personally only care that &listchars has
> an option for them, on screen they act the same as any other line
> ending or tab char.
>
>

Well, they don't. The only recognised line ending in Vim is the OS-specific
one: CR on the Mac, LF under Unix, CR+LF on Windows. IIUC, in Unicode the use
of embedded format characters is deprecated in favour of markup, e.g. in HTML
<span dir="rtl">...</span> rather than LRE ... PDF, <P>...</P> rather than
P-SEP, etc.


Best regards,
Tony.
Loading...