CJKWORD Learning book
Table of Contents
- Author's Foreword
- Features of This Technology
- Company Overview
- Business Goals
- CAW+ General Overview
- Understanding the Concept of CAW + ALPHABET 184
- Running CAW + ALPHABET 184
- Combination Principles of CAW + ALPHABET 184
- Variant Characters in CAW + ALPHABET 184
- Classification Method of CAW + ALPHABET 184
- CAW + ALPHABET 184 Alphabetical Order
- CAW + ALPHABET 184 Pronunciation Classification
- Added Notation and Easily Confused Notation
- Easily Confused Characters
Author's Foreword
[cite_start]
According to ancient history books such as the Hwandan Gogi, the Korean people were recorded to have governed the languages of the world, and their contributions to the creation of Hangeul and the formation of Chinese characters (漢字) demonstrate their linguistic and script heritage. [cite: 19]
[cite_start]
Chinese characters were used as the national script for over a thousand years and have had a profound impact on the entire population, making the technology for inputting characters, including Hanja, a historical task that cannot be discontinued. [cite: 20] [cite_start]Documents and books from before the modern era were all written in Chinese characters, and Hanja input technology is absolutely necessary for their digitalization. [cite: 21]
[cite_start]
The current input method uses the phonetic value (音價) of characters, regardless of their original form or meaning, which results in slow speeds and difficulties in handling homonyms. [cite: 22] [cite_start]As Chinese characters are the oldest and most widely used script, the problem of Hanja input remains a top global issue. [cite: 23]
Numerous countries, research institutions, universities, and companies have attempted to solve this problem, and it was even pursued as a national project in Korea but remained unsolved. [cite_start]However, our company has meticulously analyzed complex Chinese characters, created the CAW alphabet based on their meaning and phonetic values, and developed a technology for directly inputting Chinese characters through their combination, which we are now introducing to the public. [cite: 24]
Features of This Technology
[cite_start]- You can use it easily without knowing the pronunciation of Chinese or Hanja. [cite: 27] [cite_start]One of the frequent questions when introducing this technology to people unfamiliar with Chinese characters is whether it's possible to input characters without knowing them. [cite: 27]
[cite_start]- It's the same as not being able to input Hangeul into a computer or smartphone if you don't know Hangeul. [cite: 28] [cite_start]Therefore, this technology becomes an important tool for learning Chinese characters. [cite: 28]
[cite_start]- It is necessary for educational purposes in elementary, middle, and high schools, and for various personal document editing needs. [cite: 29]
[cite_start]- Hanja, Japanese, Hangeul, and English characters are all inputted directly, allowing for an enjoyable character experience without the inconvenience of selecting from a list. [cite: 30]
[cite_start]- As of December 2017, this technology has been published for personal use by incorporating our company's Chinese CAW software into a book. [cite: 31]
[cite_start]- It is focused on Hanja learning, Hanja typing, and Japanese typing, making it suitable for beginners. [cite: 32]
[cite_start]- In addition to its Hanja learning function, this technology will be developed and applied to smartphones, word processors, databases, cloud services, IoT, and the internet. [cite: 33]
[cite_start]- It is a crucial task for us, the descendants living today, to strive to prevent the nation's traditional knowledge from being buried and lost within the Hanja script. [cite: 34]
Company Overview
- 2016. [cite_start]09. Selected as an excellent SW company by the National IT Industry Promotion Agency and received a technology value assessment. [cite: 37]
- 2017. [cite_start]05. Selected as a K-Global Re-Start company. [cite: 38]
- 2017. [cite_start]06. Established Anyword Co., Ltd. [cite: 39]
- 2017. [cite_start]09. Selected as a "Saessak" (Sprout) company by the Public Procurement Service. [cite: 40]
[cite_start]
Anyword Co., Ltd. is a software development company that has dramatically improved the Chinese/Hanja input problem, which is emerging as a major global concern. [cite: 41] [cite_start]It is a strong small company that can not only develop and sell its products directly but also sell its technology to software companies worldwide. [cite: 42]
Business Goals & Strategy
Business Goals
[cite_start]- Increase sales of CAW Chinese SW. [cite: 45]
[cite_start]- Distribute CAW Chinese SW, acquire various certifications, create a promotional website, and secure clients. [cite: 46]
[cite_start]- Seek methods to monetize currently free services and improve product packaging and design. [cite: 47]
Business Content
[cite_start]- Launch Chinese input solutions and input/translation editor products. [cite: 49]
[cite_start]- Secure funding and talent to advance the business. [cite: 50]
[cite_start]- Prepare to expand the CAW user base and for monetization. [cite: 51]
Execution Method
[cite_start]- Introduce alternative learning methods for Hanja sound, meaning, and writing for elementary, middle, and high school students, and Hanja proficiency test learners, and develop content to meet learning demands. [cite: 53]
[cite_start]- Promote the software as optimal for the digitalization of classical documents. [cite: 54]
[cite_start]- Promote the Chinese input solution to media companies, research institutes, and corporations. [cite: 55]
[cite_start]- Spark interest in the younger generation by changing the Hanja input method. [cite: 56]
[cite_start]- Promote the necessity of Chinese language due to China's globalization and induce changes in domestic and international Hanja education methods. [cite: 57]
Expected Effects
Domestic:
[cite_start]- Fulfill the demand for Hanja learning for everyone from elementary school students to adults preparing for employment and Hanja proficiency tests. [cite: 60]
[cite_start]- Fulfill the demand for traditional educational content such as the Myeongsimbogam. [cite: 61]
[cite_start]- Popularize traditional culture and content that has been forgotten by digitalizing classical documents. [cite: 62]
International:
[cite_start]- Meet the global demand for Chinese language learning and Chinese character processing. [cite: 64]
[cite_start]- Resolve the inconvenience of Chinese input on smartphones. [cite: 65]
[cite_start]- Address the Chinese language issues in current application software such as Word, Hangeul, and Apple Word. [cite: 66]
[cite_start]- Address Chinese language issues in various development languages for internet browsers. [cite: 67]
CAW+ General Overview
What is CAW+?
[cite_start]
It is a platform that uses the CAW+ alphabet, which is a collection of the least common multiple shapes (character pieces) obtained after decomposing Chinese characters, arranged according to frequency. [cite: 79] [cite_start]The combination of alphabets to create the most efficient input method allows users to learn and utilize it quickly with minimal effort. [cite: 80] [cite_start]Therefore, without the need for many characters, all characters can be created simply by combining 184 alphabets. [cite: 81]
Understanding the Concept of CAW + ALPHABET 184
The ALPHABET in CAW+ refers to:
[cite_start]
The decomposed letters that make up a single Chinese character. [cite: 86] [cite_start]The Caw Alphabet consists of characters that can no longer be broken down, and each character is assigned a unique code. [cite: 87] [cite_start]However, when trying to input difficult and complex Chinese characters using these decomposed characters, code duplication and confusion arose. [cite: 88] [cite_start]Therefore, Chinese characters formed by the combination of several other characters were also designated as Caw Alphabets. [cite: 88]
[cite_start]
Thus, a total of 184 Caw Alphabets are used to write Chinese sentences. [cite: 89] [cite_start]If you are familiar with this Caw Alphabet, you can quickly and easily input even difficult and complex Chinese characters by inferring the code from the Caw Alphabets that constitute the character. [cite: 90]
Alphabet Derivation Method
[cite_start]
The 1800 educational Hanja characters are basic and high-frequency characters, so the decomposition started from these 1800 characters. [cite: 94] [cite_start]The decomposition of the 1800 characters was not into small shapes like the current ones, but by breaking down distinguishable characters and then further breaking down the collected decomposed characters to create a preliminary alphabet. [cite: 95] [cite_start]Next, the 3500 Hanja characters used in the Hanja Proficiency Test are decomposed and assembled using these preliminary alphabets, creating another form of a secondary alphabet. [cite: 96] [cite_start]It is about 95% the same as the first one, but a 1% difference carries enough weight to change the whole thing. [cite: 97] [cite_start]That is, changing one combination method can result in dozens or hundreds of combination differences. [cite: 98]
[cite_start]
The alphabet created in this way was then applied to KSC 5601 (4888 characters), China's common commercial characters (3500 characters), and IICore (9710 characters) to create the current alphabet. [cite: 99] [cite_start]Therefore, this alphabet is designed to be applied most efficiently to all characters. [cite: 100]
Running CAW + ALPHABET 184
Finding the Code Table:
[cite_start]- Double-click CAW25.exe to run it. [cite: 104]
[cite_start]- Click on "File" at the top. [cite: 105]
[cite_start]- Click on "Code Table". [cite: 106]
Combination Principles of CAW + ALPHABET 184
1. Additive Method
[cite_start]
Adding 目 (mu) to 木 (mm) creates the character 相. [cite: 147] [cite_start]Using CAW technology, you can simply type "mmmu" using the well-known characters 木 and 目. [cite: 148]
2. Substitution/Transformation Method
[cite_start]
The character 腾 (to rise) is similar in shape to 勝 (to win), where the character 力 is replaced by 馬. [cite: 150] [cite_start]This is called the substitution/transformation method. [cite: 151] [cite_start]The character 勝 is formed by 月+八, and by adding 馬, the character 騰 is created. [cite: 152]
3. Variant Characters
[cite_start]
One of the reasons that direct input of Hanja is not allowed is the existence of many variant characters. [cite: 154] [cite_start]For example, 峰 (peak) and 峯 have the same sound and meaning. [cite: 155] [cite_start]When they use the same combination, a character is added to signify the second form. [cite: 155] [cite_start]Therefore, 峰 is created with the combination of 山 + a component, and by adding another component, the character 峯 is created. [cite: 156] [cite_start]The case of 鳥 (bird) and 烏 (crow) is also resolved using the second form method. [cite: 157] [cite_start]Based on frequency, 鳥 becomes the alphabet character, and 烏 becomes the second form character. [cite: 157]
CAW + ALPHABET 184 Classification Method
Classification Method:
[cite_start]- ① Prime Code Characters (Classification by frequency): The frequencies of the 184 characters in the CAW alphabet are calculated, and the most frequently used character group is applied. [cite: 207] [cite_start](Frequency refers to the number of times a character appears most frequently within 11,000 Hanja characters when they are broken down into their indivisible components.) [cite: 208]
[cite_start]- ② Pronunciation Code Characters (Classification by frequency): This is a code based on the original pronunciation, to which the five vowels A, E, I, O, U are applied. [cite: 212] [cite_start]This is also a mapping method applied to the character with the highest frequency. [cite: 213]
[cite_start]- ③ Vowel Group Code Characters (a, e, i, o, u 5 Vowel Group Code Mapping Classification): Since the code "ba" is applied to the character 八, a character with the sound "bai" would apply "s", which corresponds to the second "a" sound, resulting in the form "bs". [cite: 216] [cite_start]The code mapping principle is based on the Yin-Yang and Five Elements of Eastern classics. [cite: 217] [cite_start]As seen in the code "ba", the first code "b" corresponds to a consonant, and the second code "a" corresponds to a vowel. [cite: 218] [cite_start]If the second code is insufficient, another vowel group code is assigned in place of "a" according to this vowel group principle. [cite: 219] [cite_start]Once you learn the pinyin of the 184 characters, you can infer the correct code even if you don't know it exactly, allowing you to master it within a day. [cite: 220]
Classification Details
Prime Code Characters (Classification 1 by Frequency)
[cite_start]
This group represents the most frequently used characters that form the basis of Chinese writing, typically composed of up to 4 strokes. [cite: 226, 227] [cite_start]For example, the knife radical (刀), with pinyin [dāo], has the highest frequency among characters starting with 'd', so it is assigned the code 'dd'. [cite: 232, 233] [cite_start]This classification consists of 20 codes created by doubling a consonant (bb, cc, dd, etc.), corresponding to all alphabets except A, F, N, O, T, and V. [cite: 268]
Pronunciation Code Characters (Classification 2 by Frequency)
[cite_start]
This is a classification where the five vowels A, E, I, O, U are applied to the code based on the character's pronunciation and frequency. [cite: 382] [cite_start]For example, the character for eight (八), with pinyin 'ba', is assigned the code 'ba' due to its high frequency. [cite: 384] [cite_start]This classification consists of characters that are used next most frequently after the prime code characters. [cite: 386] [cite_start]This group has a total of 63 codes such as ba, be, bi, bo, bu, ca, ce, etc., corresponding to all alphabets except A, I, J, K, U, and V. [cite: 438]
Vowel Group Code Characters (5 Vowel Group Code Mapping)
[cite_start]
This system groups characters into five "vowel groups" based on their proximity on the QWERTY keyboard layout. [cite: 545] [cite_start]The code for a character is formed by combining an initial consonant with a letter from one of the vowel groups. [cite: 219]
[cite_start]- A-row Vowel Group: Corresponds to the keys `a, s, d, f, g`. [cite: 619] [cite_start]This group has 18 codes. [cite: 792]
[cite_start]- E-row Vowel Group: Corresponds to the keys `e, t, r, w, q`. [cite: 621] [cite_start]This group has 16 codes. [cite: 1228]
[cite_start]- I-row Vowel Group: Corresponds to the keys `h, l, j, k, i`. [cite: 623] [cite_start]This group has 33 codes. [cite: 1597]
[cite_start]- O-row Vowel Group: Corresponds to the keys `o, p, b, n, m`. [cite: 625] [cite_start]This group has 9 codes. [cite: 1988]
[cite_start]- U-row Vowel Group: Corresponds to the keys `u, x, y, z, c`. [cite: 627] [cite_start]This group has 23 codes. [cite: 2389]
Added Notation and Easily Confused Notation
[cite_start]
This section explains parts where the combination method has been changed in recent versions or parts that are easily confused. [cite: 4546]
[cite_start]- The character 支 (branch) uses the combination of 士 and 乂. [cite: 4547]
[cite_start]- The character 囱 (chimney) uses the combination of 白 and 夂. [cite: 4548] [cite_start]While a combination of 丶 and 口 is possible, the priority was given to the one that reveals the overall outline. [cite: 4548]
[cite_start]- To the combination of 丶 and 王 for the character 主 (master), the combination of 亠 and 土 has been added. [cite: 4549]
[cite_start]- 末 (end) uses 十木, 未 (not yet) uses 一木, and 耒 (plow) uses 二木. [cite: 4550]
[cite_start]- To the combination of 車 and 丶 for the character 甫 (great), the combination of 十 and 用 has been added. [cite: 4553]
[cite_start]- To the characters 土 and 人 for 坐 (to sit), the combination of 人人土 has been added. [cite: 4554]
[cite_start]- To the characters 广 and 彐 for 唐 (Tang dynasty), the combination of 广 and 聿 has been added. [cite: 4555]
[cite_start]- The character 敖 (to ramble) uses 士方. [cite: 4556]
[cite_start]- Combinations using 丂. [cite: 4557]
[cite_start]- If the components of a combination are the same, 二 is added to express the meaning of a "second form." [cite: 4558]
[cite_start]- Variant characters with the same meaning but different forms, and characters with similar forms. [cite: 4559]
Easily Confused Characters
[cite_start]
The characters listed here are those that are used as components in other places or are not easily recalled. [cite: 4563] [cite_start]By trying to type them, you can understand their combination methods and apply them when similar-shaped characters appear. [cite: 4564]
1) High-frequency characters
[cite_start]
(This section lists examples of high-frequency characters that might be confusing.) [cite: 4565]
2) Expressions for characters with three repeating shapes
[cite_start]
(This section lists examples of characters formed by repeating a component three times.) [cite: 4569]
3) Expressions for simplified or related characters
[cite_start]
(This section lists examples of simplified characters or characters related in form.) [cite: 4570]
4) Others
[cite_start]
(This section lists other miscellaneous characters that are easily confused.) [cite: 4575]