Strings, bytes, runes and characters in Go - (4) Range loops, Libraries

Strings, bytes, runes and characters in Go

Range loops

Go 가 UTF-8 을 특별히 다루는 하나의 방법은 문자열에 for range 루프를 사용할 때이다.

for range 루프의 경우, 각각의 반복에서 하나의 UTF-8-encoded rune 을 디코딩한다. 루프를 도는 동안, 루프의 인덱스는 현재 rune 의 시작점으로 바이트로 측정되며, 코드 포인트는 그것의 값이 된다.

코드포인트의 유니코드 값과 출력 형태를 보여주는 다음의 예를 보자.

1
2
3
4
5
6
7
8
9
10
11
package main
 
import "fmt"
 
func main() {
    const nihongo = "日本語"
    for index, runeValue := range nihongo {
        fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
    }
    fmt.Printf("\n")
}
Colored by Color Scripter
cs

위 코드를 실행하면 다음과 같다. 각각의 코드포인트가 다수의 바이트를 차지하고 있음을 알 수 있다.

1
2
3
4
5
6
7
8
 
root@133-130-107-97:~/go/src/strings# ./strings
U+65E5 '日' starts at byte position 0
U+672C '本' starts at byte position 3
U+8A9E '語' starts at byte position 6
 
root@133-130-107-97:~/go/src/strings#
 
Colored by Color Scripter
cs

Libraries

Go의 표준 라이브러리는 UTF-8 텍스트를 해석하는데 있어 강력한 기능울 갖고 있다. 만약 for range 루프만으로는 코드의 목적을 달성하는데 충분하지 못하다면, 라이브러리의 패키지를 활용하는 방법이 있다.

가장 중요한 패키지로는 unicode/utf8 같은 것이 있는데, UTF-8 문자열을 검증하고, 분해하면, 다시 합치는데 있어 도움을 주는 루틴을 장착하고 있다. 위와 유사하지만 패키지내의 DecodeRuneInString 함수를 이용해 위의 예제를 구현해보자.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
package main
 
import "fmt"
import "unicode/utf8"
 
func main() {
    const nihongo = "日本語"
    for i, w := 0, 0; i < len(nihongo); i += w {
        runeValue, width := utf8.DecodeRuneInString(nihongo[i:])
        fmt.Printf("%#U starts at byte position %d\n", runeValue, i)
        w = width
    }
    fmt.Printf("\n")
}
 
Colored by Color Scripter
cs

실행해보면 앞선 예와 같은 결과를 보여줌을 알 수 있다.

1
2
3
4
~/go/src/strings# ./strings
U+65E5 '日' starts at byte position 0
U+672C '本' starts at byte position 3
U+8A9E '語' starts at byte position 6
cs

unicode/utf8 패키지에 대해 더 알고 싶으면 다음 링크를 참조한다.

https://golang.org/pkg/unicode/utf8/

저작자표시 비영리 변경금지

'프로그래밍 Programming' 카테고리의 다른 글

Go 언어 입문 - Types - Numbers (golang-book) (0)	2018.08.28
Go 언어 입문 - Your First Program (golang-book) (0)	2018.08.27
Strings, bytes, runes and characters in Go - (3) Code points, characters, and runes (0)	2018.08.25
Strings, bytes, runes and characters in Go - (2) UTF-8 and string literals (0)	2018.08.25
Strings, bytes, runes and characters in Go - (1) Printing strings (0)	2018.08.25

갈루아의 반서재

Strings, bytes, runes and characters in Go - (4) Range loops, Libraries

'프로그래밍 Programming' 카테고리의 다른 글

티스토리툴바