Info

  • Author : 천용희 (Yonghee Cheon)
  • Type : 스터디 설명자료
  • Description : LDA 간단 설명


실습 코드

  • gensim 라이브러리 사용 외 전처리는 LSA 설명과 동일합니다.
import pickle
from gensim import corpora, models

with open('data/naver_news.pickle', 'rb') as f:
    corpus = pickle.load(f)
    f.close()

tokens_matrix = [[t[0] for t in corp['tokens'] if t[1] in tok.include_poses] for corp in corpus]
dictionary_LDA = corpora.Dictionary(tokens_matrix)
dictionary_LDA.filter_extremes(no_below=3)
bow_matrix = [dictionary_LDA.doc2bow(list_of_tokens) for list_of_tokens in tokens_matrix]

num_topics = 10
lda_model = models.LdaModel(bow_matrix, num_topics=num_topics, \
                            id2word=dictionary_LDA, \
                            passes=4, alpha=[0.01] * num_topics, \
                            eta=[0.01] * len(dictionary_LDA.keys()))

for i, topic in lda_model.show_topics(formatted=True, num_topics=num_topics, num_words=10):
    print(str(i) + ": " + topic)
  • output
0: 0.025*"대통령" + 0.016*"평가" + 0.016*"채용" + 0.014*"부정" + 0.013*"조사" + 0.011*"국공" + 0.010*"정규직" + 0.009*"결과" + 0.008*"포인트" + 0.008*"전환"
1: 0.024*"경제" + 0.016*"의원" + 0.006*"정치" + 0.005*"전" + 0.005*"한국" + 0.005*"국회" + 0.005*"미래" + 0.005*"정부" + 0.005*"민주당" + 0.004*"장관"
2: 0.014*"경제" + 0.012*"지역" + 0.011*"미국" + 0.008*"확산" + 0.007*"환자" + 0.006*"정부" + 0.006*"지원" + 0.006*"연합뉴스" + 0.005*"신규" + 0.005*"대학"
3: 0.016*"경제" + 0.010*"가맹점" + 0.009*"전망" + 0.007*"지원" + 0.007*"기업" + 0.006*"지역" + 0.006*"올해" + 0.006*"상품권" + 0.005*"성장" + 0.005*"시장"
4: 0.018*"메뉴" + 0.014*"빙수" + 0.011*"출시" + 0.010*"제공" + 0.009*"맛" + 0.009*"카페" + 0.009*"매장" + 0.008*"점" + 0.008*"가맹" + 0.008*"배달"
5: 0.044*"확진" + 0.013*"감염" + 0.011*"지역" + 0.011*"발생" + 0.010*"광주" + 0.010*"판정" + 0.009*"검사" + 0.008*"서울" + 0.008*"방역" + 0.007*"교회"
6: 0.018*"홍보" + 0.013*"국세청" + 0.011*"대사" + 0.010*"성실" + 0.009*"활동" + 0.009*"모델" + 0.007*"경제" + 0.007*"서울" + 0.007*"선정" + 0.006*"국민"
7: 0.037*"정규직" + 0.021*"전환" + 0.019*"비정규직" + 0.014*"보안" + 0.013*"청년" + 0.013*"의원" + 0.012*"국공" + 0.011*"검색" + 0.010*"취업" + 0.010*"공정"
8: 0.017*"대통령" + 0.013*"정치" + 0.010*"지방" + 0.009*"거버넌스" + 0.008*"시장" + 0.007*"자치" + 0.007*"주민" + 0.007*"경제" + 0.007*"대상" + 0.006*"문"
9: 0.011*"대표" + 0.010*"검찰" + 0.009*"민주당" + 0.009*"정치" + 0.008*"국회" + 0.008*"통합" + 0.007*"장관" + 0.007*"수사" + 0.007*"위원장" + 0.007*"의원"