文章

The Theory of Generalization: From Break Points To The Vc Inequality

The Explanation of Function B

$B(N, k)$ denotes the maximum number of dichotomies on $N$ points, with break point $k$. For example, $B(1000, 4)$ denotes the number of dichotomies that a 2-dimensional perceptron can make on 1000 data points.

From Shooting Hoops to the Geometric Series

问题的定义

定义: 假如两个运动员 A 和 B 相约通过投篮的方式分出胜负, 规则是: 一人投一次, 率先投进的人获胜. 如果 A 和 B 两人投篮时所站的位置相同, 并且每次命中的概率分别是 $p$ 和 $q$, 如果 A 先投, 那么他获胜的概率是多少?

A Research on the Birthday Problem

问题的定义

定义: 房间里有多少人才能保证其中至少两个人的生日在同一天的概率不小于 50%?

Advanced Data Analytic Algorithms

Week 1

Model selection with an AI assistance

Based on the AI’s suggestion, SVM is initially selected as the model for the research implementation.

Dataset: SMS Spam Collection Task: Text Classification

Convert Jupyter Notebook to PDF With Reporting Format

Install nbconvert via pip

Show/Hide the code

1
pip install nbconvert

Now, the command

Show/Hide the code

1
jupyter nbconvert --to pdf --template-file report filename.ipynb

should work well, producing a document with a table of contents and properly formatted code blocks. However, an unnecessary counter appears before each code block, wasting space and serving no purpose. To remove it, a custom template is required.

Merge Anki Audios

exporting all selected notes with media references
extract media paths from the exported file
write a python script merging all selected audios into one

Show/Hide the code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import os
import subprocess
import pandas as pd
from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm
import random

AUDIO_DIRECTORY = r"C:\Users\kyle\AppData\Roaming\Anki2\Eng\collection.media"
SILENT = r"C:\Users\momo\Home\Projects\Anki\silent.mp3"

list_file_path = "../notes.txt"
df = pd.read_csv(
    list_file_path,
    sep="\t",
    header=None,
    names=[
        "word", "lemma", "context", "sentence", "examples", "symbol", "voc", "def",
        "word_sound", "sentence_sound"
    ],
)

word_audio_names = df["word_sound"].apply(lambda x: x.strip()[7:-1]).tolist()
df["sentence_sound"] = df["sentence_sound"].fillna("")
sentence_audio_names = df["sentence_sound"].apply(lambda x: x.strip()[7:-1]).tolist()

audio_paths = [os.path.join(AUDIO_DIRECTORY, item) for item in word_audio_names] + [
    os.path.join(AUDIO_DIRECTORY, item) for item in sentence_audio_names if item != ""
]


def normalize_audio(input_file, output_file):
    try:
        command = [
            "ffmpeg", "-loglevel", "quiet", "-y", "-i", input_file, "-ar", "24000",
            "-ab", "96k", output_file
        ]
        subprocess.run(command, check=True)
        return output_file
    except subprocess.CalledProcessError as e:
        return None


def normalize_audio_multithreaded(file_list, output_dir, max_workers=16):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        print("start normalizing audios")
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            futures = [
                executor.submit(normalize_audio, input_file,
                                os.path.join(output_dir, os.path.basename(input_file)))
                for input_file in file_list
            ]
            for _ in tqdm(as_completed(futures), total=len(file_list)):
                pass
    else:
        print("files has prepared well")


normalize_audio_multithreaded(audio_paths, "temp")

audio_paths = [(w, s) for w, s in zip(word_audio_names, sentence_audio_names)]
random.shuffle(audio_paths)

with open("file_list.txt", "w") as f:
    for w, s in audio_paths:
        f.write(f"file '{os.path.join('temp', w)}'\n")
        f.write(f"file silent.mp3\n")
        f.write(f"file '{os.path.join('temp', w)}'\n")
        f.write(f"file silent.mp3\n")
        if s != "":
            f.write(f"file '{os.path.join('temp', s)}'\n")
            f.write(f"file silent.mp3\n")
            f.write(f"file silent.mp3\n")

command = [
    "ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", "file_list.txt", "-c", "copy",
    "output.mp3"
]
subprocess.run(command, check=True)

Prune a Dictionary File(.mdx)

Introduction

Dictionaries such as “Oxford” and “Longman” are used as the sources for the definitions in my flashcards. However, they sometimes contain too much unnecessary information, therefore making it unwieldy and grandiose, leading to a low loading speed and distractions.

饥荒联机版专用服务器和多层世界配置指南

此文全程处于linux环境，阅读此文需具备一些linux使用经验。

GNU Parallel 的妙用

Android QQ 的图片文件是以下面这种结构保存的，以Cache_开头的文件就是图片。这种结构在浏览时要分别点进每个文件夹才能看到图片，很不方便。

使用 Shell 批量修改文件后缀和批量格式转换

shell脚本是处理这类工作最为简单，快捷的方式。

修改后缀

Show/Hide the code

1
for file in *.原后缀; do mv "$file" "`echo $file | sed s/.原后缀/.新后缀/`"; done

格式转换

heic 转 jpg

首先，安装转换工具。