请问ugii tmp dir-OPTION是什么意思

十万火急!!!WIN7旗舰版64位启动UG8.0要双击NX 8.0\UGII\的 ugii.bat ,请问这是怎么回事!电脑ATI显卡_百度知道
十万火急!!!WIN7旗舰版64位启动UG8.0要双击NX 8.0\UGII\的 ugii.bat ,请问这是怎么回事!电脑ATI显卡
bat打开与正常打开有什么不同,会不会对显卡或者电脑有伤害?UG8?求专业解答.0强行用ugii!还有就是为什么32位系统不会出现这种情况
提问者采纳
这应该是正常打开UG的方法之一,不会对机器产生不利影响。
但是 这到底是为什么!这样好不方便啊!
在桌面上添加一个该批处理文件的快捷方式就可以在桌面儿上直接起动UG了。具体方法:右键点击桌面,选择新建,选择新建快捷方式,点浏览后直至找到ugii.bat,确定即可。
提问者评价
其他类似问题
按默认排序
其他1条回答
没听过需要这样做的说法我用过4,5,6,8
都是直接双击快捷方式就行了
不行 ,直接双击就会闪退!一闪即逝!好无语!
win7旗舰版64位的相关知识
等待您来回答
下载知道APP
随时随地咨询
出门在外也不愁请问我的UG8.0无法运行了,问题是:environment file ugii_env.dat does not exist_百度知道
请问我的UG8.0无法运行了,问题是:environment file ugii_env.dat does not exist
请问这个如何解决?
我有更好的答案
按默认排序
文件丢失了,你可以让别人传给你
请问你有吗?我的邮箱是:
我没有,不好意思
其他类似问题
dat的相关知识
等待您来回答
下载知道APP
随时随地咨询
出门在外也不愁can be used by applications on platforms where the
C library does not yet provide an equivalent function to find, how
many column positions a character or string will occupy on a UTF-8
terminal emulator screen.
Markus Kuhn’s
transliteration table for applications that have to make a best-effort
conversion from Unicode to ASCII or some 8-bit character set. It
contains a comprehensive list of substitution strings for Unicode
characters, comparable to the fallback notations that people use
commonly in email and on typewriters to represent unavailable
characters. The table comes in ISO/IEC TR 14652 format, to allow
simple inclusion into POSIX locale definition files.
What is the status of Unicode support for various X widget libraries?
project added full-featured Unicode support to GTK+.
supported the use of
*-ISO10646-1 fonts since version 2.0.
was prepared by
Jean-Marc Lienher, based on his Xutf8 Unicode display library.
What packages with UTF-8 support are currently under development?
Native Unicode support is planned for Emacs 23. If you are
interested in contributing/testing, please join the
emacs-devel @gnu.org mailing list.
works on a complete revision of the VT100 emulator built
into the Linux kernel, which will improve the simplistic UTF-8 support
already there.
How does UTF-8 support work under Solaris?
Starting with Solaris 2.8, UTF-8 is at least partially supported.
To use it, just set one of the UTF-8 locales, for instance by typing
setenv LANG en_US.UTF-8
in a C shell.
Now the dtterm terminal emulator can be used to input
and output UTF-8 text and the mp print filter will print
UTF-8 files on PostScript printers. The en_US.UTF-8
locale is at the moment supported by Motif and CDE desktop
applications and libraries, but not by OpenWindows, XView, and
OPENLOOK DeskSet applications and libraries.
For more information, read Sun’s
Can I use UTF-8 on the Web?
Yes. There are two ways in which a HTTP server can indicate to a
client that a document is encoded in UTF-8:
Make sure that the HTTP header of a document contains the
Content-Type: text/ charset=utf-8
if the file is HTML, or the line
Content-Type: text/ charset=utf-8
if the file is plain text. How this can be achieved depends on your
web server. If you use
and you have a subdirecory in which all *.html or *.txt files are
encoded in UTF-8, then create there a file .htaccess and add to it the two lines
AddType text/charset=UTF-8 html
AddType text/charset=UTF-8 txt
A webmaster can modify /etc/httpd/mime.types to make the same change
for all subdirectories simultaneously.
If you cannot influence the HTTP headers that the web server
prefixes to your documents automatically, then add in a HTML document
under HEAD the element
&META http-equiv=Content-Type content="text/ charset=UTF-8">
which usually has the same effect. This obviously works only for HTML
files, not for plain text. It also announces the encoding of the file
to the parser only after the parser has already started to read the
file, so it is clearly the less elegant approach.
The currently most widely used browsers support UTF-8 well enough
to generally recommend UTF-8 for use on web pages. The old Netscape 4
browser used an annoyingly large single font for displaying any UTF-8
document. Best upgrade to Mozilla, Netscape 6 or some other recent
browser (Netscape 4 is generally very buggy and not maintained any
There is also the question of how non-ASCII characters entered into
HTML forms are encoded in the subsequent HTTP GET or POST request that
transfers the field contents to a CGI script on the server.
Unfortunately, both standardization and implementation are still a huge mess here, as
discussed in the
FORM submission and i18n tutorial by Alan Flavell. We can only
hope that a practice of doing all this in UTF-8 will emerge
eventually. See also the discussion about Mozilla bug
How are PostScript glyph names related to UCS codes?
See Adobe’s Unicode
and Glyph Names guide.
Are there any well-defined UCS subsets?
With over 40000 characters, the design of a font that covers every
single Unicode character is an enormous project, not just regarding
the number of glyphs that need to be created, but also in terms of the
calligraphic expertise required to do an adequate job for each script.
As a result, there are hardly any fonts that try to cover “all of
Unicode”. While a few projects have attempted to create single
complete Unicode fonts, their quality is not comparable with that of
many good smaller fonts. For example, the Unicode and ISO 10646 books
are still printed using a large collection of different fonts that
only together cover the entire repertoire. Any high-quality font can
only cover the Unicode subset for which the designer feels competent
and confident.
Older, regional character encoding standards defined both an encoding
and a repertoire of characters that an individual calligrapher could
handle. Unicode lacks the latter, but in the interest of
interoperability, it is useful to have defined a hand full of
standardized subsets, each a few hundred to a few thousand character
large and targeted at particular markets, that font designers could
practically aim to cover. A number of different UCS subsets already
have been established:
is a set of 650 characters that covers all
the 8-bit MS-DOS, Windows, Mac, and ISO code pages that Microsoft had
used before. All Windows fonts now cover at least the WGL4 repertoire.
WGL4 is a superset of CEN MES-1. ().
Three European
UCS subsets MES-1, MES-2, and MES-3 have been defined by the
European standards committee CEN/TC304 in CWA 13873:
MES-1 is a very small Latin subset with only 335 characters. It
contains exactly all characters found in ISO 6937 plus the EURO SIGN.
This means MES-1 contains all characters of ISO 8859 parts
1,2,3,4,9,10,15. [Note: If your aim is to provide only the cheapest
and simplest reasonable Central European UCS subset, I would implement
MES-1 plus the following important 14 additional characters found in
Windows code page 1252 but not in MES-1: U+0192, U+02C6, U+02DC,
U+2013, U+2014, U+201A, U+201E, U+2020, U+2021, U+2022, U+2026,
U+2030, U+2039, U+203A.]
MES-2 is a Latin/Greek/Cyrillic/Armenian/Georgian subset with 1052
characters. It covers every language and every 8-bit code page used in
Europe (not just the EU!) and European language countries. It also
adds a small collection of mathematical symbols for use in technical
documentation. MES-2 is a superset of MES-1. If you are developing
only for a European or Western market, MES-2 is the recommended
repertoire. [Note: For bizarre committee-politics reasons, the
following eight WGL4 characters are missing from MES-2: U+2113,
U+212E, U+2215, U+25A1, U+25AA, U+25AB, U+25CF, U+25E6. If you
implement MES-2, you should definitely also add those and then you can
claim WGL4 conformance in addition.]
MES-3 is a very comprehensive UCS subset with 2819 characters. It
simply includes every UCS collection that seemed of potential use to
European users. This is for the more ambitious implementors. MES-3 is
a superset of MES-2 and WGL4.
specifies 7 non-overlapping UCS subsets for
Japanese users:
Basic Japanese (6884 characters): JIS X , JIS X
Japanese Non-ideographic Supplement (1913 characters): JIS X
non-kanji, plus various other non-kanji
Japanese Ideographic Supplement 1 (918 characters): some JIS X
Japanese Ideographic Supplement 2 (4883 characters): remaining JIS
Japanese Ideographic Supplement 3 (8745 characters): remaining
Chinese characters
Full-width Alphanumeric (94 characters): for compatibility
Half-width Katakana (63 characters): for compatibility
The ISO 10646 standard splits up its repertoire into a number of
that can be used to define and document implemented
subsets. Unicode defines similar, but not quite identical, blocks of
characters, which correspond to sections in the Unicode standard.
memo written in 1995 by someone who obviously did not like ISO 10646
and was unaware of JIS X . It discusses a UCS subset called
“ISO-10646-J-1” consisting of 14 UCS collections, some of which are
intersected with JIS X 0208. This is just what a particular font in an
old Japanese Windows NT version from 1995 happened to implement. RFC
1815 is completely obsolete and irrelevant today and should best be
Markus Kuhn has defined in the ucs-fonts.tar.gz README three UCS
subsets TARGET1, TARGET2, TARGET3 that are sensible extensions of the
corresponding MES subsets and that were the basis for the completion
of this xterm font package.
Markus Kuhn’s
Perl script
allows convenient set arithmetic over UCS subsets for anyone who wants
to define a new one or wants to check coverage of an implementation.
What issues are there to consider when converting encodings
The Unicode Consortium maintains a collection of mapping
tables between Unicode and various older encoding standards. It is
important to understand that the primary purpose of these tables was
to demonstrate that Unicode is a superset of the mapped legacy
encodings, and to document the motivation and origin behind those
Unicode characters that were included into the standard primarily for
round-trip compatibility reasons with older character sets. The
implementation of good character encoding conversion rountines is a
significantly more complex task than just blindly applying these
example mapping tables! This is because some character sets
distinguish characters that others unify.
The Unicode mapping tables alone are to some degree well suited to
directly convert text from the older encodings to Unicode. High-end
conversion tools nevertheless should provide interactive mechanisms,
where characters that are unified in the legacy encoding but
distinguished in Unicode can interactively or semi-automatically be
disambiguated on a case-by-case basis.
Conversion in the opposite direction from Unicode to a legacy
character set requires non-injective (= many-to-one) extensions of
these mapping tables. Several Unicode characters have to be mapped to
a single code point in many legacy encodings. The Unicode consortium
currently does not maintain standard many-to-one tables for this
purpose and does not define any standard behavior of coded character
set conversion tools.
Here are some examples for the many-to-one mappings that have to be
handled when converting from Unicode into something else:
UCS charactersequivalent characterin target code
U+00B5 MICRO SIGNU+03BC GREEK SMALL LETTER MU
0xB5ISO 8859-1
U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVEU+212B ANGSTROM SIGN
0xC5ISO 8859-1
U+03B2 GREEK CAPITAL LETTER BETAU+00DF LATIN SMALL LETTER SHARP S
U+03A9 GREEK CAPITAL LETTER OMEGAU+2126 OHM SIGN
U+03B5 GREEK SMALL LETTER EPSILONU+2208 ELEMENT OF
U+005C REVERSE SOLIDUSU+FF3C FULLWIDTH REVERSE SOLIDUS
0x2140JIS X 0208
A first approximation of such many-to-one tables can be generated
from available normalization information, but these then still have to
be manually extended and revised. For example, it seems obvious that
the character 0xE1 in the original IBM PC character set was meant to
be useable as both a Greek small beta (because it is located between
the code positions for alpha and gamma) and as a German sharp-s
character (because that code is produced when pressing this letter on
a German keyboard). Similarly 0xEE can be either the mathematical
element-of sign, as well as a small epsilon. These characters are not
Unicode normalization equivalents, because although they look similar
in low-resolution video fonts, they are very different characters in
high-quality typography. IBM’s
tables for CP437 reflected one usage in some cases, Microsoft’s
the other, both equally sensible. A good code converter should aim to
be compatible with both, and not just blindly use the Microsoft
mapping table alone when converting from Unicode.
The Unicode
database does contain in field 5 the Character Decomposition
Mapping that can be used to generate some of the above example
mappings automatically. As a rule, the output of a
Unicode-to-Something converter should not depend on whether the
Unicode input has first been converted into Normalization Form
C or not. For equivalence information on Chinese, Japanese, and
Korean Han/Kanji/Hanja characters, use the Unihan database.
In the cases of the IBM PC characters in the above examples, where the
normalization tables do not offer adequate mapping, the
cross-references to similar looking characters in the Unicode book are
a valuable source of suggestions for equivalence mappings. In the end,
which mappings are used and which not is a matter of taste and
observed usage.
The Unicode consortium used to maintain mapping tables to CJK
character set standards, but has declared them to be obsolete, because
their presence on the Unicode web server led to the development of a
number of inadequate and naive EUC converters. In particular, the (now
obsolete) CJK Unicode mapping tables had to be slightly modified
sometimes to preserve information in combination encodings. For
example, the standard mappings provide round-trip compatibility for
conversion chains ASCII to Unicode to ASCII as well as for JIS X 0208
to Unicode to JIS X 0208. However, the EUC-JP encoding covers the
union of ASCII and JIS X 0208, and the UCS repertoire covered by the
ASCII and JIS X 0208 mapping tables overlaps for one character, namely
U+005C REVERSE SOLIDUS. EUC-JP converters therefore have to use a
slightly modified JIS X 0208 mapping table, such that the JIS X 0208
code 0x 0xC0 in EUC-JP) gets mapped to U+FF3C FULLWIDTH
REVERSE SOLIDUS. This way, round-trip compatibility from EUC-JP to
Unicode to EUC-JP can be guaranteed without any loss of information.
provides further guidance on
this issue. Another problem area is compatibility with older
conversion tables, as explained in an essay by
Tomohiro Kubota.
In addition to just using standard normalization mappings,
developers of code converters can also offer transliteration support.
Transliteration is the conversion of a Unicode character into a
graphically and/or semantically similar character in the target code,
even if the two are distinct characters in Unicode after
normalization. Examples of transliteration:
UCS charactersequivalent characterin target code
U+0022 QUOTATION MARKU+201C LEFT DOUBLE QUOTATION MARK
U+201D RIGHT DOUBLE QUOTATION MARK
U+201E DOUBLE LOW-9 QUOTATION MARK
U+201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK
0x22ISO 8859-1
The Unicode Consortium does not provide or maintain any standard
transliteration tables at this time. CEN/TC304 has a draft report
“European fallback rules” on recommended ASCII fallback characters for
MES-2 in the pipeline, but this is not yet mature. Which
transliterations are appropriate or not can in some cases depend on
language, application field, and most of all personal preference.
Available Unicode transliteration tables include, for example, those
found in Bruno Haible’s libiconv, the glibc 2.2 locales, and
Markus Kuhn’s
Is X11 ready for Unicode?
(2005) is the latest version of the X Consortium’s sample
implementation of the X11 Window System standards. The bulk of the current X11 standards
and parts of the sample implementation still pre-date widespread
interest in Unicode under Unix.
Among the things that have already been fixed are:
Keysyms: Since X11R6.9, a keysym value has been allocated
for every Unicode character in Appendix A of the X Window System
Protocol specification. Any UCS character in the range U-
to U-00FFFFFF can now be represented by a keysym value in the range
0x to 0x01ffffff. This scheme was proposed by Markus Kuhn in
1998 and has been supported by a number of applications for many
years, starting with xterm. The revised Appendix A now also contains
an official UCS cross reference column in its table of pre-Unicode
legacy keysyms.
UTF-8 locales: The X11R6.8 sample implementation added
support for UTF-8 locales.
Fonts: A number of comprehensive Unicode standard fonts
were added in X11R6.8, and they are now supported by some of the
classic standard tools, such as xterm.
There remain a number of problems in the X11 standards and some
inconveniences in the sample implementation for Unicode users that
still need to be fixed in one of the next X11 releases:
UTF-8 cut and paste: The ICCCM
standard still does not specify how to transfer UCS strings in
selections. Some vendors have added UTF-8 as yet another encoding to
the existing
mechanism (CTEXT). This is not a good solution for
at least the following reasons:
CTEXT is a rather complicated ISO 2022 mechanism and Unicode
offers the opportunity to provide not just another add-on to CTEXT,
but to replace the entire monster with something far simpler, more
convenient, and equally powerful.
Many existing applications can communicate selections via CTEXT,
but do not support a newly added UTF-8 option. A user of CTEXT has to
decide whether to use the old ISO 2022 encodings or the new UTF-8
encoding, but both cannot be offered simultaneously. In other words,
adding UTF-8 to CTEXT seriously breaks backwards compatibility with
existing CTEXT applications.
The current CTEXT specification even explicitly forbids the
addition of UTF-8 in section 6: “ISO registered ‘other coding systems’
are not used in Compound T extended segments are the only
mechanism for non-2022 encodings.”
has written an Inter-Client Exchange of Unicode Text draft proposal for an
extension of the ICCCM to handle UTF-8 selections with a new
UTF8_STRING atom that can be used as a property type and selection
target. This clean approach fixes all of the above problems.
UTF8_STRING is just as state-less and easy to use as the existing
STRING atom (which is reserved exclusively for ISO 8859-1 strings and
therefore not usable for UTF-8), and adding a new selection target
allows applications to offer selections in both the old CTEXT and the
new UTF8_STRING format simultaneously, which maximizes
interoperability. The use of UTF8_STRING can be negociated between the
selection holder and requestor, leading to no compatibility issues
whatsoever. Markus Kuhn has prepared an
that adds the necessary definition to the standard. Current
status: The UTF8_STRING atom has now been officially registered with X.Org,
and we hope for an update of the ICCCM in one of the next releases.
Application window properties: In order to assist the
window manager in correctly labeling windows, the ICCCM 2.0
specification requires applications to assign properties such as
WM_NAME, WM_ICON_NAME and WM_CLIENT_MACHINE to each window. The old
ICCCM 2.0 (1993) defines these to be of the polymorphic type TEXT,
which means that they can have their text encoding indicated using one
of the property types STRING (ISO 8859-1), COMPOUND_TEXT (a ISO 2022
subset), or C_STRING (unknown character set). Simply adding
UTF8_STRING as a new option for TEXT would break backwards
compatibility with old window managers that do not know about this
type. Therefore, the freedesktop.org draft
standard developped in the Window Manager
Specification Project adds new additional window properties
_NET_WM_NAME, _NET_WM_ICON_NAME, etc. that have type UTF8_STRING.
Inefficient font data structures:
The Xlib API and X11 protocol data structures used for representing
font metric information are extremely inefficient when handling
sparsely populated fonts. The most common way of accessing a font in
an X client is a call to XLoadQueryFont(), which allocates memory for
an XFontStruct and fetches its content from the server. XFontStruct
contains an array of XCharStruct entries (12 bytes each). The size of
this array is the code position of the last character minus the code
position of the first character plus one. Therefore, any
“*-iso10646-1” font that contains both U+0020 and U+FFFD will cause an
XCharStruct array with 65502 elements to be allocated (even for
CharCell fonts), which requires 786 kilobytes of client-side memory
and data transmission, even if the font contains only a thousand
characters.
A few workarounds have been used so far:
The non-Asian -misc-fixed-*-iso10646-1 fonts that
come with XFree86 4.0 contain no characters above U+31FF. This reduces
the memory requirement to 153 kilobytes, which is still bad, but much
less so. (There are actually many useful characters above U+31FF
present in the BDF files, waiting for the day when this problem will
be fixed, but they currently all have an encoding of -1 and are
therefore ignored by the X server. If you need these characters, then
just install the
applying the bdftruncate script).
Starting with XFree86 4.0.3, the truncation of a BDF font can also
be done by specifying a character code subrange at the end of the
XLFD, as described in the XLFD
specification, section 3.1.2.12. For example,
-Misc-Fixed-Medium-R-Normal--20-200-75-75-C-100-ISOxf]
will load only the Ethiopic part of this BDF font with a
correspondingly nicely small XFontStruct. Earlier X server versions
will simply ignore the font subset brackets and will give you the full
font, so there is no compatibility problem with using that.
Bruno Haible has written a BIGFONT protocol extension for XFree86
4.0, which uses a compressed transmission of XCharStruct from server
to client and also uses shared memory in Xlib between several clients
which have loaded the same font.
These workarounds do not solve the underlying problem that
XFontStruct is unsuitable for sparsely populated fonts, but they do
provide a significant efficiency improvement without requiring any
changes in the API or client source code. One real solution would be
to extend or replace XFontStruct with something slightly more flexible
that contains a sorted list or hash table of characters as opposed to
an array. This redesign of XFontStruct would at the same time also
allow the addition of the urgently needed provisions for combining
characters and ligatures.
Another approach would be to introduce a new font encoding, which
could be called for instance “ISO10646-C” (the C stands for combining,
complex, compact, or character-glyph mapped, as you prefer). In this
encoding, the numbers assigned to each glyph are really font-specific
glyph numbers and are not equivalent to any UCS character code
positions. The information necessary to do a character-to-glyph
mapping would have to be stored in to be standardized new properties.
This new font encoding would be used by applications together with a
few efficient C functions that perform the character-to-glyph code
makeiso10646cglyphmap(XFontStruct *font, iso10646cglyphmap
Reads the character-to-glyph mapping table from the font
properties into a compact and efficient in-memory representation.
freeiso10646cglyphmap(iso10646cglyphmap *map)
Frees that in-memory representation.
mbtoiso10646c(char *string, iso10646cglyphmap *map, XChar2b
wctoiso10646c(wchar_t *string, iso10646cglyphmap *map,
XChar2b *output)These take a Unicode character string and
convert it into a XChar2b glyph string suitable for
output by XDrawString16 with the ISO10646-C font from
which the iso10646cglyphmap was extracted.
ISO10646-C fonts would still be limited to having not more than 64
kibiglyphs,
but these can come from anywhere in UCS, not just from the BMP. This
solution also easily provides for glyph substitution, such that we can
finally handle the Indic fonts. It solves the huge-XFontStruct problem
of ISO10646-1, as XFontStruct grows now proportionally with the number
of glyphs, not with the highest characters. It could also provide for
simple overstriking combining characters, but then the glyphs for
combining characters would have to be stored with negative width
inside an ISO10646-C font. It can even provide support for variable
combining accent positions, by having several alternative combining
glyphs with accents at different heights for the same combining
character, with the ligature substitution tables encoding which
combining glyph to use with which base character.
TODO: write specification for ISO10646-C properties, write sample
implementations of the mapping routines, and add these to xterm, GTK,
and other applications and libraries. Any volunteers?
Combining characters: The X11 specification does not
support combining characters in any way. The font information lacks
the data necessary to perform high-quality automatic accent placement
(as it is found, for example, in all TeX fonts). Various people have
experimented with implementing simplest overstriking combining
characters using zero-width characters with ink on the left side of
the origin, but details of how to do this exactly are unspecified
(e.g., are zero-width characters allowed in CharCell and Monospaced
fonts?) and this is therefore not yet widely established practice.
Ligatures: The Indic scripts need font file formats that
support ligature substitution, which is at the moment just as
completely out of the scope of the X11 specification as are combining
characters.
Several XFree86 team members have worked on these issues. X.Org, the official successor of the X
Consortium and the Opengroup as the custodian of the X11 standards and
the sample implementation, has taken over the results or is still
considering them.
With regard to the font related problems, the solution will
probably be to dump the old server-side font mechanisms entirely and
use instead
new Xft. Another
related work-in-progress is Standard Type Services (ST)
framework that Sun has been working on.
What are useful Perl one-liners for working with UTF-8?
These examples assume that you have Perl 5.8.1 or newer and that
you work in a UTF-8 locale (i.e., “locale charmap” outputs “UTF-8”).
For Perl 5.8.0, option -C is not needed and
the examples without -C will not work in a UTF-8 locale.
You really should no longer use Perl 5.8.0, as its Unicode support had
lots of bugs.
Print the euro sign (U+20AC) to stdout:
perl -C -e 'print pack("U",0x20ac)."\n"'
perl -C -e 'print "\x{20ac}\n"'
# works only from U+0100 upwards
Locate malformed UTF-8 sequences:
perl -ne '/^(([\x00-\x7f]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf]{2}|[\xf0-\xf7][\x80-\xbf]{3})*)(.*)$/;print "$ARGV:$.:".($-[3]+1).":$_" if length($3)'
Locate non-ASCII bytes:
perl -ne '/^([\x00-\x7f]*)(.*)$/;print "$ARGV:$.:".($-[2]+1).":$_" if length($2)'
Convert non-ASCII characters into SGML/HTML/XML-style decimal
numeric character references (e.g. Ş becomes
perl -C -pe 's/([^\x00-\x7f])/sprintf("&#%d;", ord($1))/'
Convert (hexa)decimal numeric character references to UTF-8:
perl -C -pe 's/&\#(\d+);/chr($1)/s/&\#x([a-fA-F\d]+);/chr(hex($1))/'
How can I enter Unicode characters?
There are a range of techniques for entering Unicode characters
that are not present by default on your keyboard.
Application-independent methods
Copy-and-paste from a small file that lists your most commonly
used Unicode characters in a convenient and for your needs suitably
chosen arrangement. This is usually the most convenient and
appropriate method for relatively rarely required very special
characters, such as more esoteric mathematical operators.
Extend your keyboard mapping using xmodmap. This is particularly
convenient if your keyboard has an AltGr key, which is meant for
exactly this purpose (some US keyboards have instead of AltGr just a
right Alt key, others lack that key entirely unfortunately, in which
case some other key must be assigned the Mode_switch function). Write
a file "~/.Xmodmap" with entries such as
keycode 113 = Mode_switch Mode_switch
keysym d = d NoSymbol degree
keysym m = m NoSymbol emdash
keysym n = n NoSymbol endash
keysym 2 = 2 quotedbl twosuperior
keysym 3 = 3 sterling threesuperior NoSymbol
keysym 4 = 4 dollar
keysym space = space
nobreakspace NoSymbol
keysym minus = minus
underscore
keycode 34 = bracketleft
leftsinglequotemark
leftdoublequotemark
keycode 35 = bracketright braceright rightsinglequotemark rightdoublequotemark
keysym KP_Subtract = KP_Subtract NoSymbol U2212
keysym KP_Multiply = KP_Multiply NoSymbol multiply NoSymbol
keysym KP_Divide
= KP_Divide
NoSymbol division NoSymbol
and load it with "xmodmap ~/.Xmodmap" from your X11 startup script
into your X server. You will then find that you get with AltGr easily
the following new characters out of your keyboard:
AltGr+ NBSP
AltGr+keypad-/÷
AltGr+keypad-*×
The above example file is meant for a UK keyboard, but easily
adapted to other layouts and extended with your own choice of
characters. If you use Microsoft Windows, try
to make similar customizations.
hexadecimal input method: Hold down both the Ctrl and Shift key while
typing the hexadecimal Unicode number. After releasing Ctrl and Shift,
you have entered the corresponding Unicode character.
This is currently implemented in GTK+ 2, and works in applications
such as GNOME Terminal, Mozilla and Firefox.
Application-specific methods
In VIM, type Ctrl-V u followed by a hexadecimal number. Example:
Ctrl-V u 20ac
In Microsoft Windows, press the Alt key while typing the decimal
Unicode number with a leading zero on the numeric keypad. Example:
press-Alt 08364 release-Alt
In Microsoft Word, type a hexadecimal number and then press Alt+X
to turn it into the corresponding Unicode character. Example: 20ac Alt-X
Are there any good mailing lists on these issues?
You should certainly be on the linux-utf8@nl.linux.org
mailing list. That’s the place to meet for everyone interested in
working towards better UTF-8 support for GNU/Linux or Unix systems and
applications. To subscribe, send a message to linux-utf8-request@nl.linux.org with the subject
subscribe. You can also browse the linux-utf8 archive and
subscribe from there via a web interface.
There is also the
mailing list, which is the best
way of finding out what the authors of the Unicode standard and a lot
of other gurus have to say. To subscribe, send to unicode-request@unicode.org
a message with the subject line “subscribe” and the text “subscribe
YOUR@EMAIL.ADDRESS unicode”.
The relevant mailing list for discussions about Unicode support in
Xlib and the X server is now xorg at xorg.org. In the past, there were
also the fonts and i18n at
xfree86.org mailing lists, whose archives still contain valueable
information.
Further references
Bruno Haible’s .
Unicode Standard, Version 5.0, Addison-Wesley, 2006. You
definitely should have a copy of the standard if you are doing
anything related to fonts and character sets.
Ken Lunde’s
Information Processing, O’Reilly & Associates, 1999. This is
clearly the best book available if you are interested in East Asian
character sets.
Mark Davis’
The USENIX Winter 1993 paper by Rob Pike and Ken Thompson on the
reports about the experience gained when Plan 9 migrated as the
first operating system back in 1992 completely to UTF-8 (which was at
the time still called UTF-2). A must read!
is a project
initiated by several Linux distributors to enhance Unicode support for
free operating systems. It published the , as well as some patches.
contains definitions of all the ISO C
Amendment 1 function, plus extensions such as wcwidth().
The Open Group’s summary of ISO
C Amendment 1.
The Unicode Consortium character database
are an essential resource for anyone developing
Unicode related tools.
Other conversion tables are available from
Michael Everson’s Unicode and JTC1/SC2/WG2
Archive contains online versions of many of the more recent ISO
10646-1 amendments, plus many other goodies. See also his Roadmaps to the Universal Character Set.
An introduction into The
Universal Character Set (UCS).
Otfried Cheong’s essay on Han Unification
in Unicode
revised and extended the mathematical characters for Unicode 3.2 and
ISO 10646-2. They are now preparing a freely available the STIX Fonts family of fully hinted
Type1 and TrueType fonts, covering the over 7700 characters needed for
scientific publishing in a “Times compatible” design.
Jukka Korpela’s Soft hyphen (SHY) –
a hard problem? is an excellent discussion of the controversy
surrounding U+00AD.
James Briggs’ .
Mark Davis discusses in
the tradeoffs between UTF-8, UTF-16, and UCS-4 (now
also called UTF-32 for political reasons). Doug Ewell wrote A survey of
Unicode compression.
Alan Wood has a good page on Unicode and Multilingual
Support in Web Browsers and HTML.
produced various Unicode related standards
such as the International String Ordering (ISO 14651) and the Cultural Convention Specification TR (ISO TR 14652) (an extension
of the POSIX locale format that covers, for example, transliteration of
wide character output).
(Ideographic Rapporteur Group)
answers queries on languages, character sets and names, as does the Zvon Character
China has specified in GB 18030 a new encoding of UCS for use in Chinese government
systems that is backwards-compatible with the widely used GB 2312 and
GBK encodings for Chinese. It seems though that the first version
(released 2000-03) is somewhat buggy and will likely go through a
couple more revisions, so use with care. GB 18030 is probably more of
a temporary migration path to UCS and will probably not survive for
long against UTF-8 or UTF-16, even in Chinese government systems.
Kong Supplementary Character Set (HKSCS)
Various people propose UCS alternatives: Rosetta, Bytext.
Proceedings of the International Unicode Conferences: ICU13, ICU14, ICU15, ICU16, ICU17, ICU18, etc.
This FAQ has been translated into other languages:
Korean: 2001-02
Be aware that each translation reflects only some past version of this document,
which I update occasionally.
Suggestions for improvement are welcome.
Special thanks to Ulrich Drepper, Bruno Haible, Robert Brady,
Juliusz Chroboczek, Shuhei Amakawa, Jungshik Shi, Robert Rogers, Roman
Czyborra, Josef Hinteregger and many others for valuable comments, and
to SuSE GmbH, Nürnberg, for their past support.
This work is
licensed under a.
– last modified
http://www.cl.cam.ac.uk/~mgk25/unicode.html

我要回帖

更多关于 px是什么意思 的文章

 

随机推荐