or1ko's diary

日々を書きます

04-06

言語処理100本ノック 2015

04. 元素記号
"Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."という文を単語に分解し,1, 5, 6, 7, 8, 9, 15, 16, 19番目の単語は先頭の1文字,それ以外の単語は先頭に2文字を取り出し,取り出した文字列から単語の位置(先頭から何番目の単語か)への連想配列(辞書型もしくはマップ型)を作成せよ.

04.hs

import qualified Data.Map as M

main = print $ M.fromList $ zip (map f $ zip (words s) [1..]) [1..]
  where 
    s = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."

f (s,n)  
  | elem n ns = take 1 s 
  | otherwise = take 2 s
    where
    ns =  [1, 5, 6, 7, 8, 9, 15, 16, 19]
> runghc 04.hs
fromList [("Al",13),("Ar",18),("B",5),("Be",4),("C",6),("Ca",20),("Cl",17),("F",9),("H",1),("He",2),("K",19),("Li",3),("
Mi",12),("N",7),("Na",11),("Ne",10),("O",8),("P",15),("S",16),("Si",14)]

ghciで簡単に書く方法を思いつかなかったのでファイルにした。
辞書は久しぶりに使った。

05. n-gram
与えられたシーケンス(文字列やリストなど)からn-gramを作る関数を作成せよ.この関数を用い,"I am an NLPer"という文から単語bi-gram,文字bi-gramを得よ.

Prelude> import Data.List
Prelude Data.List> let f s = filter (\x -> length x == 2) $ subsequences s
Prelude Data.List> f $ words "I am an NLPer"
[["I","am"],["I","an"],["am","an"],["I","NLPer"],["am","NLPer"],["an","NLPer"]]
Prelude Data.List> f $ "I am an NLPer"
["I ","Ia"," a","Im"," m","am","I ","  ","a ","m ","Ia"," a","aa","ma"," a","In"," n","an","mn"," n","an","I ","  ","a "
,"m ","  ","a ","n ","IN"," N","aN","mN"," N","aN","nN"," N","IL"," L","aL","mL"," L","aL","nL"," L","NL","IP"," P","aP"
,"mP"," P","aP","nP"," P","NP","LP","Ie"," e","ae","me"," e","ae","ne"," e","Ne","Le","Pe","Ir"," r","ar","mr"," r","ar"
,"nr"," r","Nr","Lr","Pr","er"]
Prelude Data.List>

06. 集合
"paraparaparadise"と"paragraph"に含まれる文字bi-gramの集合を,それぞれ, XとYとして求め,XとYの和集合,積集合,差集合を求めよ.さらに,'se'というbi-gramがXおよびYに含まれるかどうかを調べよ.

06.hs

import Data.List

main = do 
  print $ union x y
  print $ intersect x y
  print $ x \\ y
    where
      x = nub $ bigram "paraparaparadise"
      y = nub $ bigram "paragraph"

bigram = filter (\x -> length x == 2 ) . subsequences
> runghc 06.hs
["pa","pr","ar","aa","ra","pp","ap","rp","rr","pd","ad","rd","pi","ai","ri","di","ps","as","rs","ds","is","pe","ae","re"
,"de","ie","se","pg","ag","rg","gr","ga","gp","ph","ah","rh","gh"]
["pa","pr","ar","aa","ra","pp","ap","rp","rr"]
["pd","ad","rd","pi","ai","ri","di","ps","as","rs","ds","is","pe","ae","re","de","ie","se"]