Info

Author: 천용희 (Yonghee Cheon)
Description
- 이전 네이버 영화 리뷰데이터 분류 실습을 기반으로 익숙해지기 위해 새로운 데이터를 활용하여 end-to-end 직접 다시 코딩하였습니다.
개선점
- train, validation, test set 전처리를 한 번에 진행한 후 세 셋을 한 번에 분리합니다.

준비

필요 모듈 임포트

이전의 네이버 영화 리뷰데이터 분류와 유사한 패키지를 사용합니다.

import re
import random
import time
import datetime

import pandas as pd
import numpy as np
import torch
import matplotlib.pyplot as plt

from scipy import stats
from transformers import BertTokenizer
from transformers import BertForSequenceClassification, AdamW, BertConfig
from transformers import get_linear_schedule_with_warmup
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.datasets import imdb
from sklearn.model_selection import train_test_split

Separate 정규표현식 함수화

문장부호 기준으로 문장을 나누었습니다.
텍스트 데이터를 관찰한 결과 <br> tag가 등장하는 경우에도 문장이 나뉘는 것으로 판단하였습니다.

def separator(text):
    return re.sub(pattern="<b.*/>|\.|\?|\!", string=text, repl=" [SEP] ")

IMDB 데이터 로드

생각보다 원하는 스타일의 IMDB 데이터가 바로 나오지 않았습니다.
실습 데이터는 캐글의 데이터를 이용했습니다.

df = pd.read_csv('imdbdata.csv', header=0)
df.head(5)

전처리

변수 할당 및 변수 인코딩

review 컬럼에 실제 리뷰 텍스트 데이터가, sentiment에 긍부정값이 존재합니다.
positive이면 1, negative면 0으로 변수를 인코딩했습니다.

texts = df.review
classes = df.sentiment
classes_oh = df.sentiment.apply(lambda x: 1 if x == 'positive' else 0)
classes_oh = classes_oh.values

버트 토큰화 함수

BERT의 분류모델에 넣는 전처리 형식의 함수를 만들었습니다.

def bert_tokenize(texts):
    return [tokenizer.tokenize("[CLS] " + text + " [SEP]") for text in texts]

토큰화 및 id 할당

지금까지 자연어처리를 하면서 이러한 태스크는 보통 uncased가 잘 나왔기 때문에 uncased를 사용하겠습니다.
uncased 모델을 사용하기 때문에 lowercase로 사용합니다.
영어이므로 multilingual은 필요하지 않습니다.

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)
tokenized = bert_tokenize(texts)
bert_ids = [tokenizer.convert_tokens_to_ids(tokens) for tokens in tokenized]

Padding 크기 산정을 위해 분포 확인

이 데이터의 토큰 갯수 분포 확인해야 padding에 사용할 MAX_LEN을 정할 수 있습니다.
분포를 보고 MAX_LEN은 512로 결정하였습니다.

number_of_tokens = np.array([len(bert_id) for bert_id in bert_ids])
stats.describe(number_of_tokens)

패딩

truncating과 padding을 둘 다 post로 설정해야 원하는 것처럼 앞부분부터 채워지며 MAX_LEN보다 길면 뒤가 잘리고, 짧으면 0으로 채워집니다.

MAX_LEN = 512
padded_bert_ids = pad_sequences(bert_ids, maxlen=MAX_LEN, dtype='long',
									truncating='post', padding='post')
padded_bert_ids[0]

output

array([  101,  2028,  1997,  1996,  2060, 15814,  2038,  3855,  2008,
        2044,  3666,  2074,  1015, 11472,  2792,  2017,  1005,  2222,
        2022, 13322,  1012,  2027,  2024,  2157,  1010,  2004,  2023,
        2003,  3599,  2054,  3047,  2007,  2033,  1012,  1026,  7987,
        1013,  1028,  1026,  7987,  1013,  1028,  1996,  2034,  2518,
        2008,  4930,  2033,  2055, 11472,  2001,  2049, 24083,  1998,
        4895, 10258,  2378,  8450,  5019,  1997,  4808,  1010,  2029,
        2275,  1999,  2157,  2013,  1996,  2773,  2175,  1012,  3404,
        2033,  1010,  2023,  2003,  2025,  1037,  2265,  2005,  1996,
        8143, 18627,  2030,  5199,  3593,  1012,  2023,  2265,  8005,
        2053, 17957,  2007, 12362,  2000,  5850,  1010,  3348,  2030,
        4808,  1012,  2049,  2003, 13076,  1010,  1999,  1996,  4438,
        2224,  1997,  1996,  2773,  1012,  1026,  7987,  1013,  1028,
        1026,  7987,  1013,  1028,  2009,  2003,  2170, 11472,  2004,
        2008,  2003,  1996,  8367,  2445,  2000,  1996, 17411,  4555,
        3036,  2110,  7279,  4221, 12380,  2854,  1012,  2009,  7679,
        3701,  2006, 14110,  2103,  1010,  2019,  6388,  2930,  1997,
        1996,  3827,  2073,  2035,  1996,  4442,  2031,  3221, 21430,
        1998,  2227, 20546,  2015,  1010,  2061,  9394,  2003,  2025,
        2152,  2006,  1996, 11376,  1012,  7861,  2103,  2003,  2188,
        2000,  2116,  1012,  1012, 26030,  2015,  1010,  7486,  1010,
       18542, 10230,  1010,  7402,  2015,  1010,  8135,  1010, 16773,
        1010,  3493,  1998,  2062,  1012,  1012,  1012,  1012,  2061,
        8040, 16093, 28331,  1010,  2331, 14020,  1010, 26489,  6292,
       24069,  1998, 22824, 10540,  2024,  2196,  2521,  2185,  1012,
        1026,  7987,  1013,  1028,  1026,  7987,  1013,  1028,  1045,
        2052,  2360,  1996,  2364,  5574,  1997,  1996,  2265,  2003,
        2349,  2000,  1996,  2755,  2008,  2009,  3632,  2073,  2060,
        3065,  2876,  1005,  1056,  8108,  1012,  5293,  3492,  4620,
        4993,  2005,  7731,  9501,  1010,  5293, 11084,  1010,  5293,
        7472,  1012,  1012,  1012, 11472,  2987,  1005,  1056,  6752,
        2105,  1012,  1996,  2034,  2792,  1045,  2412,  2387,  4930,
        2033,  2004,  2061, 11808,  2009,  2001, 16524,  1010,  1045,
        2481,  1005,  1056,  2360,  1045,  2001,  3201,  2005,  2009,
        1010,  2021,  2004,  1045,  3427,  2062,  1010,  1045,  2764,
        1037,  5510,  2005, 11472,  1010,  1998,  2288, 17730,  2000,
        1996,  2152,  3798,  1997,  8425,  4808,  1012,  2025,  2074,
        4808,  1010,  2021, 21321,  1006, 15274,  4932,  2040,  1005,
        2222,  2022,  2853,  2041,  2005,  1037, 15519,  1010, 13187,
        2040,  1005,  2222,  3102,  2006,  2344,  1998,  2131,  2185,
        2007,  2009,  1010,  2092,  5450,  2098,  1010,  2690,  2465,
       13187,  2108,  2357,  2046,  3827,  7743,  2229,  2349,  2000,
        2037,  3768,  1997,  2395,  4813,  2030,  3827,  3325,  1007,
        3666, 11472,  1010,  2017,  2089,  2468,  6625,  2007,  2054,
        2003,  8796, 10523,  1012,  1012,  1012,  1012,  2008,  2015,
        2065,  2017,  2064,  2131,  1999,  3543,  2007,  2115,  9904,
        2217,  1012,   102,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0,     0,
           0,     0,     0,     0,     0,     0,     0,     0])

어텐션 마스크

이전과 동일하게 학습속도를 높이기 위하여 실 데이터가 있는 곳과 padding이 있는 곳을 attention에게 알려줍니다.

attention_masks = []
for seq in padded_bert_ids:
    seq_mask = [float(i>0) for i in seq]
    attention_masks.append(seq_mask)

train, val, test set 분리

set을 분리하기 전에 전처리를 모두 진행하여 셋을 나누고 다시 전처리하는 불편을 없앴습니다.
전체 데이터 중 70%를 train-val set으로, 30%를 test set으로 사용하고 train-val set 중 10%를 val set으로 활용합니다.
전체의 63%가 학습시 가중치 업데이트를 위해 사용됩니다.
만약 jupyter나 colab 환경이 아닌 곳(interpreter가 ipython이 아닌 곳)에서 실행시 display를 print로 바꿔야 합니다.

X_train, X_test, y_train, y_test = \
train_test_split(padded_bert_ids, classes_oh, random_state=42, test_size=0.3)

masks_train, masks_test, _, _ = train_test_split(attention_masks, padded_bert_ids, 
                                                 random_state=42, test_size=0.3)

X_train, X_val, y_train, y_val = \
train_test_split(X_train, y_train, random_state=42, test_size=0.1)

masks_train, masks_val, _, _ = train_test_split(masks_train, masks_train, 
                                                 random_state=42, test_size=0.1)
display(
    f"X_train: {X_train.shape}",
    f"X_val: {X_val.shape}",
    f"X_test: {X_test.shape}",
    f"y_train: {y_train.shape}",
    f"y_val: {y_val.shape}",
    f"y_test: {y_test.shape}",
    f"masks_train: {len(masks_train)}",
    f"masks_val: {len(masks_val)}",
    f"masks_test: {len(masks_test)}",
)

output

'X_train: (31500, 512)'
'X_val: (3500, 512)'
'X_test: (15000, 512)'
'y_train: (31500,)'
'y_val: (3500,)'
'y_test: (15000,)'
'masks_train: 31500'
'masks_val: 3500'
'masks_test: 15000'

파이토치 텐서로 변환

train_inputs = torch.tensor(X_train)
train_labels = torch.tensor(y_train)
train_masks = torch.tensor(masks_train)
validation_inputs = torch.tensor(X_val)
validation_labels = torch.tensor(y_val)
validation_masks = torch.tensor(masks_val)

test_inputs = torch.tensor(X_test)
test_labels = torch.tensor(y_test)
test_masks = torch.tensor(masks_test)

print(train_inputs.shape)
print(train_labels.shape)
print(train_masks.shape)
print(validation_inputs.shape)
print(validation_labels.shape)
print(validation_masks.shape)
print(test_inputs.shape)
print(test_labels.shape)
print(test_masks.shape)

output

torch.Size([31500, 512])
torch.Size([31500])
torch.Size([31500, 512])
torch.Size([3500, 512])
torch.Size([3500])
torch.Size([3500, 512])
torch.Size([15000, 512])
torch.Size([15000])
torch.Size([15000, 512])

데이터로더 세팅

MAX_LEN이 128이었던 저번 데이터와 달리 MAX_LEN이 512이기 때문에 VRAM에 탑재 가능한 배치 사이즈가 매우 줄었습니다.

BATCH_SIZE = 4

train_data = TensorDataset(train_inputs, train_masks, train_labels)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=BATCH_SIZE)

validation_data = TensorDataset(validation_inputs, validation_masks, validation_labels)
validation_sampler = SequentialSampler(validation_data)
validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=BATCH_SIZE)

test_data = TensorDataset(test_inputs, test_masks, test_labels)
test_sampler = RandomSampler(test_data)
test_dataloader = DataLoader(test_data, sampler=test_sampler, batch_size=BATCH_SIZE)

gpu 할당

0번 GPU를 할당합니다.

device = torch.device("cuda:0")

모델 생성 및 학습 스케쥴링

transformers의 BertForSequenceClassification 모듈을 이용합니다.
세팅은 이전과 동일합니다.

model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

optimizer = AdamW(model.parameters(),
                  lr = 2e-5, # 학습률
                  eps = 1e-8 # 0으로 나누는 것을 방지하기 위한 epsilon 값
                )

epochs = 4
total_steps = len(train_dataloader) * epochs

scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps = 0,
                                            num_training_steps = total_steps)

학습

# 정확도 계산 함수
def flat_accuracy(preds, labels):
    
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()

    return np.sum(pred_flat == labels_flat) / len(labels_flat)

# 시간 표시 함수
def format_time(elapsed):

    # 반올림
    elapsed_rounded = int(round((elapsed)))
    
    # hh:mm:ss으로 형태 변경
    return str(datetime.timedelta(seconds=elapsed_rounded))
    

# 재현을 위해 랜덤시드 고정
seed_val = 42
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

# 그래디언트 초기화
model.zero_grad()

# 에폭만큼 반복
for epoch_i in range(0, epochs):
    
    # ========================================
    #               Training
    # ========================================
    
    print("")
    print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
    print('Training...')

    # 시작 시간 설정
    t0 = time.time()

    # 로스 초기화
    total_loss = 0

    # 훈련모드로 변경
    model.train()
        
    # 데이터로더에서 배치만큼 반복하여 가져옴
    for step, batch in enumerate(train_dataloader):
        # 경과 정보 표시
        if step % 500 == 0 and not step == 0:
            elapsed = format_time(time.time() - t0)
            print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))

        # 배치를 GPU에 넣음
        batch = tuple(t.to(device) for t in batch)
        
        # 배치에서 데이터 추출
        b_input_ids, b_input_mask, b_labels = batch

        # Forward 수행                
        outputs = model(b_input_ids, 
                        token_type_ids=None, 
                        attention_mask=b_input_mask, 
                        labels=b_labels)
        
        # 로스 구함
        loss = outputs[0]

        # 총 로스 계산
        total_loss += loss.item()

        # Backward 수행으로 그래디언트 계산
        loss.backward()

        # 그래디언트 클리핑
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        # 그래디언트를 통해 가중치 파라미터 업데이트
        optimizer.step()

        # 스케줄러로 학습률 감소
        scheduler.step()

        # 그래디언트 초기화
        model.zero_grad()

    # 평균 로스 계산
    avg_train_loss = total_loss / len(train_dataloader)            

    print("")
    print("  Average training loss: {0:.2f}".format(avg_train_loss))
    print("  Training epcoh took: {:}".format(format_time(time.time() - t0)))
        
    # ========================================
    #               Validation
    # ========================================

    print("")
    print("Running Validation...")

    #시작 시간 설정
    t0 = time.time()

    # 평가모드로 변경
    model.eval()

    # 변수 초기화
    eval_loss, eval_accuracy = 0, 0
    nb_eval_steps, nb_eval_examples = 0, 0

    # 데이터로더에서 배치만큼 반복하여 가져옴
    for batch in validation_dataloader:
        # 배치를 GPU에 넣음
        batch = tuple(t.to(device) for t in batch)
        
        # 배치에서 데이터 추출
        b_input_ids, b_input_mask, b_labels = batch
        
        # 그래디언트 계산 안함
        with torch.no_grad():     
            # Forward 수행
            outputs = model(b_input_ids, 
                            token_type_ids=None, 
                            attention_mask=b_input_mask)
        
        # 로스 구함
        logits = outputs[0]

        # CPU로 데이터 이동
        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()
        
        # 출력 로짓과 라벨을 비교하여 정확도 계산
        tmp_eval_accuracy = flat_accuracy(logits, label_ids)
        eval_accuracy += tmp_eval_accuracy
        nb_eval_steps += 1

    print("  Accuracy: {0:.2f}".format(eval_accuracy/nb_eval_steps))
    print("  Validation took: {:}".format(format_time(time.time() - t0)))

print("")
print("Training complete!")

output

# 앞 생략

======== Epoch 4 / 4 ========
Training...
  Batch   500  of  7,875.    Elapsed: 0:02:04.
  Batch 1,000  of  7,875.    Elapsed: 0:04:06.
  Batch 1,500  of  7,875.    Elapsed: 0:06:08.
  Batch 2,000  of  7,875.    Elapsed: 0:08:11.
  Batch 2,500  of  7,875.    Elapsed: 0:10:13.
  Batch 3,000  of  7,875.    Elapsed: 0:12:14.
  Batch 3,500  of  7,875.    Elapsed: 0:14:17.
  Batch 4,000  of  7,875.    Elapsed: 0:16:18.
  Batch 4,500  of  7,875.    Elapsed: 0:18:20.
  Batch 5,000  of  7,875.    Elapsed: 0:20:27.
  Batch 5,500  of  7,875.    Elapsed: 0:22:29.
  Batch 6,000  of  7,875.    Elapsed: 0:24:30.
  Batch 6,500  of  7,875.    Elapsed: 0:26:35.
  Batch 7,000  of  7,875.    Elapsed: 0:28:38.
  Batch 7,500  of  7,875.    Elapsed: 0:30:42.

  Average training loss: 0.02
  Training epcoh took: 0:32:15

Running Validation...
  Accuracy: 0.94
  Validation took: 0:01:01

Training complete!

Test 셋 평가

#시작 시간 설정
t0 = time.time()

# 평가모드로 변경
model.eval()

# 변수 초기화
eval_loss, eval_accuracy = 0, 0
nb_eval_steps, nb_eval_examples = 0, 0

# 데이터로더에서 배치만큼 반복하여 가져옴
for step, batch in enumerate(test_dataloader):
    # 경과 정보 표시
    if step % 100 == 0 and not step == 0:
        elapsed = format_time(time.time() - t0)
        print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(test_dataloader), elapsed))

    # 배치를 GPU에 넣음
    batch = tuple(t.to(device) for t in batch)
    
    # 배치에서 데이터 추출
    b_input_ids, b_input_mask, b_labels = batch
    
    # 그래디언트 계산 안함
    with torch.no_grad():     
        # Forward 수행
        outputs = model(b_input_ids, 
                        token_type_ids=None, 
                        attention_mask=b_input_mask)
    
    # 로스 구함
    logits = outputs[0]

    # CPU로 데이터 이동
    logits = logits.detach().cpu().numpy()
    label_ids = b_labels.to('cpu').numpy()
    
    # 출력 로짓과 라벨을 비교하여 정확도 계산
    tmp_eval_accuracy = flat_accuracy(logits, label_ids)
    eval_accuracy += tmp_eval_accuracy
    nb_eval_steps += 1

print("")
print("Accuracy: {0:.2f}".format(eval_accuracy/nb_eval_steps))
print("Test took: {:}".format(format_time(time.time() - t0)))

  Batch   100  of  3,750.    Elapsed: 0:00:07.
  Batch   200  of  3,750.    Elapsed: 0:00:14.
  Batch   300  of  3,750.    Elapsed: 0:00:21.
  Batch   400  of  3,750.    Elapsed: 0:00:28.
  Batch   500  of  3,750.    Elapsed: 0:00:35.
  Batch   600  of  3,750.    Elapsed: 0:00:42.
  Batch   700  of  3,750.    Elapsed: 0:00:49.
  Batch   800  of  3,750.    Elapsed: 0:00:56.
  Batch   900  of  3,750.    Elapsed: 0:01:03.
  Batch 1,000  of  3,750.    Elapsed: 0:01:10.
  Batch 1,100  of  3,750.    Elapsed: 0:01:17.
  Batch 1,200  of  3,750.    Elapsed: 0:01:24.
  Batch 1,300  of  3,750.    Elapsed: 0:01:31.
  Batch 1,400  of  3,750.    Elapsed: 0:01:38.
  Batch 1,500  of  3,750.    Elapsed: 0:01:45.
  Batch 1,600  of  3,750.    Elapsed: 0:01:52.
  Batch 1,700  of  3,750.    Elapsed: 0:01:59.
  Batch 1,800  of  3,750.    Elapsed: 0:02:06.
  Batch 1,900  of  3,750.    Elapsed: 0:02:13.
  Batch 2,000  of  3,750.    Elapsed: 0:02:20.
  Batch 2,100  of  3,750.    Elapsed: 0:02:27.
  Batch 2,200  of  3,750.    Elapsed: 0:02:34.
  Batch 2,300  of  3,750.    Elapsed: 0:02:41.
  Batch 2,400  of  3,750.    Elapsed: 0:02:48.
  Batch 2,500  of  3,750.    Elapsed: 0:02:55.
  Batch 2,600  of  3,750.    Elapsed: 0:03:02.
  Batch 2,700  of  3,750.    Elapsed: 0:03:09.
  Batch 2,800  of  3,750.    Elapsed: 0:03:16.
  Batch 2,900  of  3,750.    Elapsed: 0:03:23.
  Batch 3,000  of  3,750.    Elapsed: 0:03:30.
  Batch 3,100  of  3,750.    Elapsed: 0:03:37.
  Batch 3,200  of  3,750.    Elapsed: 0:03:44.
  Batch 3,300  of  3,750.    Elapsed: 0:03:51.
  Batch 3,400  of  3,750.    Elapsed: 0:03:58.
  Batch 3,500  of  3,750.    Elapsed: 0:04:05.
  Batch 3,600  of  3,750.    Elapsed: 0:04:12.
  Batch 3,700  of  3,750.    Elapsed: 0:04:19.

Accuracy: 0.94
Test took: 0:04:23

94%의 accuracy가 나왔습니다.

정답과 예측값 비교

성능을 눈으로 확인하기 위해 정답과 예측값을 직접 비교합니다.
각 리뷰에 대해 정답값, 예측값 순으로 출력합니다.

for i in range(20):
    t = [df.iloc[i].review]
    print(t)
    print(np.argmax(test_sentences(t)), classes_oh[i])

output

["One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.<br /><br />The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.<br /><br />It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.<br /><br />I would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side."]
1 1
['A wonderful little production. <br /><br />The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece. <br /><br />The actors are extremely well chosen- Michael Sheen not only "has got all the polari" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams\' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master\'s of comedy and his life. <br /><br />The realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional \'dream\' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell\'s murals decorating every surface) are terribly well done.']
1 1
['I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well bread suspected serial killer). While some may be disappointed when they realize this is not Match Point 2: Risk Addiction, I thought it was proof that Woody Allen is still fully in control of the style many of us have grown to love.<br /><br />This was the most I\'d laughed at one of Woody\'s comedies in years (dare I say a decade?). While I\'ve never been impressed with Scarlet Johanson, in this she managed to tone down her "sexy" image and jumped right into a average, but spirited young woman.<br /><br />This may not be the crown jewel of his career, but it was wittier than "Devil Wears Prada" and more interesting than "Superman" a great comedy to go see with friends.']
1 1
["Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.<br /><br />This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.<br /><br />OK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watched a drama with some meaningless thriller spots.<br /><br />3 out of 10 just for the well playing parents & descent dialogs. As for the shots with Jake: just ignore them."]
0 0
['Petter Mattei\'s "Love in the Time of Money" is a visually stunning film to watch. Mr. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situations we encounter. <br /><br />This being a variation on the Arthur Schnitzler\'s play about the same theme, the director transfers the action to the present time New York where all these different characters meet and connect. Each one is connected in one way, or another to the next person, but no one seems to know the previous point of contact. Stylishly, the film has a sophisticated luxurious look. We are taken to see how these people live and the world they live in their own habitat.<br /><br />The only thing one gets out of all these souls in the picture is the different stages of loneliness each one inhabits. A big city is not exactly the best place in which human relations find sincere fulfillment, as one discerns is the case with most of the people we encounter.<br /><br />The acting is good under Mr. Mattei\'s direction. Steve Buscemi, Rosario Dawson, Carol Kane, Michael Imperioli, Adrian Grenier, and the rest of the talented cast, make these characters come alive.<br /><br />We wish Mr. Mattei good luck and await anxiously for his next work.']
1 1
['Probably my all-time favorite movie, a story of selflessness, sacrifice and dedication to a noble cause, but it\'s not preachy or boring. It just never gets old, despite my having seen it some 15 or more times in the last 25 years. Paul Lukas\' performance brings tears to my eyes, and Bette Davis, in one of her very few truly sympathetic roles, is a delight. The kids are, as grandma says, more like "dressed-up midgets" than children, but that only makes them more fun to watch. And the mother\'s slow awakening to what\'s happening in the world and under her own roof is believable and startling. If I had a dozen thumbs, they\'d all be "up" for this movie.']
1 1
["I sure would like to see a resurrection of a up dated Seahunt series with the tech they have today it would bring back the kid excitement in me.I grew up on black and white TV and Seahunt with Gunsmoke were my hero's every week.You have my vote for a comeback of a new sea hunt.We need a change of pace in TV and this would work for a world of under water adventure.Oh by the way thank you for an outlet like this to view many viewpoints about TV and the many movies.So any ole way I believe I've got what I wanna say.Would be nice to read some more plus points about sea hunt.If my rhymes would be 10 lines would you let me submit,or leave me out to be in doubt and have me to quit,If this is so then I must go so lets do it."]
1 1
["This show was an amazing, fresh & innovative idea in the 70's when it first aired. The first 7 or 8 years were brilliant, but things dropped off after that. By 1990, the show was not really funny anymore, and it's continued its decline further to the complete waste of time it is today.<br /><br />It's truly disgraceful how far this show has fallen. The writing is painfully bad, the performances are almost as bad - if not for the mildly entertaining respite of the guest-hosts, this show probably wouldn't still be on the air. I find it so hard to believe that the same creator that hand-selected the original cast also chose the band of hacks that followed. How can one recognize such brilliance and then see fit to replace it with such mediocrity? I felt I must give 2 stars out of respect for the original cast that made this show such a huge success. As it is now, the show is just awful. I can't believe it's still on the air."]
0 0
["Encouraged by the positive comments about this film on here I was looking forward to watching this film. Bad mistake. I've seen 950+ films and this is truly one of the worst of them - it's awful in almost every way: editing, pacing, storyline, 'acting,' soundtrack (the film's only song - a lame country tune - is played no less than four times). The film looks cheap and nasty and is boring in the extreme. Rarely have I been so happy to see the end credits of a film. <br /><br />The only thing that prevents me giving this a 1-score is Harvey Keitel - while this is far from his best performance he at least seems to be making a bit of an effort. One for Keitel obsessives only."]
0 0
['If you like original gut wrenching laughter you will like this movie. If you are young or old then you will love this movie, hell even my mom liked it.<br /><br />Great Camp!!!']
1 1
['Phil the Alien is one of those quirky films where the humour is based around the oddness of everything rather than actual punchlines.<br /><br />At first it was very odd and pretty funny but as the movie progressed I didn\'t find the jokes or oddness funny anymore.<br /><br />Its a low budget film (thats never a problem in itself), there were some pretty interesting characters, but eventually I just lost interest.<br /><br />I imagine this film would appeal to a stoner who is currently partaking.<br /><br />For something similar but better try "Brother from another planet"']
0 0
["I saw this movie when I was about 12 when it came out. I recall the scariest scene was the big bird eating men dangling helplessly from parachutes right out of the air. The horror. The horror.<br /><br />As a young kid going to these cheesy B films on Saturday afternoons, I still was tired of the formula for these monster type movies that usually included the hero, a beautiful woman who might be the daughter of a professor and a happy resolution when the monster died in the end. I didn't care much for the romantic angle as a 12 year old and the predictable plots. I love them now for the unintentional humor.<br /><br />But, about a year or so later, I saw Psycho when it came out and I loved that the star, Janet Leigh, was bumped off early in the film. I sat up and took notice at that point. Since screenwriters are making up the story, make it up to be as scary as possible and not from a well-worn formula. There are no rules."]
0 0
['So im not a big fan of Boll\'s work but then again not many are. I enjoyed his movie Postal (maybe im the only one). Boll apparently bought the rights to use Far Cry long ago even before the game itself was even finsished. <br /><br />People who have enjoyed killing mercs and infiltrating secret research labs located on a tropical island should be warned, that this is not Far Cry... This is something Mr Boll have schemed together along with his legion of schmucks.. Feeling loneley on the set Mr Boll invites three of his countrymen to play with. These players go by the names of Til Schweiger, Udo Kier and Ralf Moeller.<br /><br />Three names that actually have made them selfs pretty big in the movie biz. So the tale goes like this, Jack Carver played by Til Schweiger (yes Carver is German all hail the bratwurst eating dudes!!) However I find that Tils acting in this movie is pretty badass.. People have complained about how he\'s not really staying true to the whole Carver agenda but we only saw carver in a first person perspective so we don\'t really know what he looked like when he was kicking a**.. <br /><br />However, the storyline in this film is beyond demented. We see the evil mad scientist Dr. Krieger played by Udo Kier, making Genetically-Mutated-soldiers or GMS as they are called. Performing his top-secret research on an island that reminds me of "SPOILER" Vancouver for some reason. Thats right no palm trees here. Instead we got some nice rich lumberjack-woods. We haven\'t even gone FAR before I started to CRY (mehehe) I cannot go on any more.. If you wanna stay true to Bolls shenanigans then go and see this movie you will not be disappointed it delivers the true Boll experience, meaning most of it will suck.<br /><br />There are some things worth mentioning that would imply that Boll did a good work on some areas of the film such as some nice boat and fighting scenes. Until the whole cromed/albino GMS squad enters the scene and everything just makes me laugh.. The movie Far Cry reeks of scheisse (that\'s poop for you simpletons) from a fa,r if you wanna take a wiff go ahead.. BTW Carver gets a very annoying sidekick who makes you wanna shoot him the first three minutes he\'s on screen.']
0 0
["The cast played Shakespeare.<br /><br />Shakespeare lost.<br /><br />I appreciate that this is trying to bring Shakespeare to the masses, but why ruin something so good.<br /><br />Is it because 'The Scottish Play' is my favorite Shakespeare? I do not know. What I do know is that a certain Rev Bowdler (hence bowdlerization) tried to do something similar in the Victorian era.<br /><br />In other words, you cannot improve perfection.<br /><br />I have no more to write but as I have to write at least ten lines of text (and English composition was never my forte I will just have to keep going and say that this movie, as the saying goes, just does not cut it."]
0 0
["This a fantastic movie of three prisoners who become famous. One of the actors is george clooney and I'm not a fan but this roll is not bad. Another good thing about the movie is the soundtrack (The man of constant sorrow). I recommand this movie to everybody. Greetings Bart"]
1 1
["Kind of drawn in by the erotic scenes, only to realize this was one of the most amateurish and unbelievable bits of film I've ever seen. Sort of like a high school film project. What was Rosanna Arquette thinking?? And what was with all those stock characters in that bizarre supposed Midwest town? Pretty hard to get involved with this one. No lessons to be learned from it, no brilliant insights, just stilted and quite ridiculous (but lots of skin, if that intrigues you) videotaped nonsense....What was with the bisexual relationship, out of nowhere, after all the heterosexual encounters. And what was with that absurd dance, with everybody playing their stereotyped roles? Give this one a pass, it's like a million other miles of bad, wasted film, money that could have been spent on starving children or Aids in Africa....."]
0 0
["Some films just simply should not be remade. This is one of them. In and of itself it is not a bad film. But it fails to capture the flavor and the terror of the 1963 film of the same title. Liam Neeson was excellent as he always is, and most of the cast holds up, with the exception of Owen Wilson, who just did not bring the right feel to the character of Luke. But the major fault with this version is that it strayed too far from the Shirley Jackson story in it's attempts to be grandiose and lost some of the thrill of the earlier film in a trade off for snazzier special effects. Again I will say that in and of itself it is not a bad film. But you will enjoy the friction of terror in the older version much more."]
1 1
["This movie made it into one of my top 10 most awful movies. Horrible. <br /><br />There wasn't a continuous minute where there wasn't a fight with one monster or another. There was no chance for any character development, they were too busy running from one sword fight to another. I had no emotional attachment (except to the big bad machine that wanted to destroy them) <br /><br />Scenes were blatantly stolen from other movies, LOTR, Star Wars and Matrix. <br /><br />Examples<br /><br />>The ghost scene at the end was stolen from the final scene of the old Star Wars with Yoda, Obee One and Vader. <br /><br />>The spider machine in the beginning was exactly like Frodo being attacked by the spider in Return of the Kings. (Elijah Wood is the victim in both films) and wait......it hypnotizes (stings) its victim and wraps them up.....uh hello????<br /><br />>And the whole machine vs. humans theme WAS the Matrix..or Terminator.....<br /><br />There are more examples but why waste the time? And will someone tell me what was with the Nazi's?!?! Nazi's???? <br /><br />There was a juvenile story line rushed to a juvenile conclusion. The movie could not decide if it was a children's movie or an adult movie and wasn't much of either. <br /><br />Just awful. A real disappointment to say the least. Save your money."]
0 0
['I remember this film,it was the first film i had watched at the cinema the picture was dark in places i was very nervous it was back in 74/75 my Dad took me my brother & sister to Newbury cinema in Newbury Berkshire England. I recall the tigers and the lots of snow in the film also the appearance of Grizzly Adams actor Dan Haggery i think one of the tigers gets shot and dies. If anyone knows where to find this on DVD etc please let me know.The cinema now has been turned in a fitness club which is a very big shame as the nearest cinema now is 20 miles away, would love to hear from others who have seen this film or any other like it.']
1 1
["An awful film! It must have been up against some real stinkers to be nominated for the Golden Globe. They've taken the story of the first famous female Renaissance painter and mangled it beyond recognition. My complaint is not that they've taken liberties with the facts; if the story were good, that would perfectly fine. But it's simply bizarre -- by all accounts the true story of this artist would have made for a far better film, so why did they come up with this dishwater-dull script? I suppose there weren't enough naked people in the factual version. It's hurriedly capped off in the end with a summary of the artist's life -- we could have saved ourselves a couple of hours if they'd favored the rest of the film with same brevity."]
0 0

Info

준비

필요 모듈 임포트

Separate 정규표현식 함수화

IMDB 데이터 로드

전처리

변수 할당 및 변수 인코딩

버트 토큰화 함수

토큰화 및 id 할당

Padding 크기 산정을 위해 분포 확인

패딩

어텐션 마스크

train, val, test set 분리

파이토치 텐서로 변환

데이터로더 세팅

gpu 할당

모델 생성 및 학습 스케쥴링

학습

output

Test 셋 평가

정답과 예측값 비교

You Might Also Like...