当前位置：首页 > 编程日记 > 正文

打造专属BGM，Python 深度学习教你

编程日记 2024-11-12 13:00:00

作者 | 李秋键

头图 | 下载于视觉中国

出品 | AI科技大本营（ID:rgznai100）

音乐+文字，组合食用，效果更佳。

引言：

“那些听不到音乐的人，以为跳舞的人疯了。” 尼采这句话好有趣，也告诉我们音乐对于日常生活的不可或缺之处。但是对于一般人来说，想要精通各种乐器难度较高。故今天我们来实践一个普通人可以制作的音乐项目，用深度学习的方法让计算机自动生成自己需要的音乐。完整代码见文末。

其中生成的效果如下可见：

模型建立

1.1 环境要求

本次环境使用的是python3.6.5+windows平台，主要用的库有：

Argparse库是python自带的命令行参数解析包，可以用来方便地读取命令行参数；

glob获取本地文件，在这里用来快速获取训练数据集；

Pickle用在机器学习中，可以把训练好的模型存储起来，这样在进行决策时直接将模型读出，而不需要重新训练模型，这样就大大节约了时间。它可以序列化对象并保存到磁盘中，并在需要的时候读取出来，任何对象都可以执行序列化操作。

Keras库是一个高层神经网络API，Keras由纯Python编写而成并基Tensorflow、Theano以及CNTK后端。Keras的核心数据结构是“模型”，模型是一种组织网络层的方式。Keras中主要的模型是Sequential模型，Sequential是一系列网络层按顺序构成的栈。在这里我们用它来建立BLSTM模型

1.2 数据集处理

本项目使用了音乐文件是midi文件，因为它们易于解析和学习使用midi文件给我们带来了很多好处，因为我们可以轻松地检测到音符的音高和持续时间。在本次项目中，时间步长和序列长度是网络的两个重要因素。时间步长决定了我们分析和产生每个音符的时间，而序列长度决定了我们如何学习歌曲中的模式。设定0.25秒的时间步长和每个时间步长8个音符。这对应于4/4的拍号，对我们来说意味着8个不同的序列，共4个音符。通过学习这些序列并重复它们，我们可以生成听起来像实际音乐的模式，并以此为基础进行构建。

音乐的重要组成部分是可变长度音符和休止符的动态和创造性使用。比如先是发出的长长的音符，然后是平静的停顿，可以在听我们听到演奏者的心灵倾泻而出的声音时，向听众发出一波情感。为了捕捉到这一点，引入长音符，短音符和休止符的方法，以便我们可以在整首歌曲中产生不同的情感。

（1）获取训练集所有的音符和和弦

notes = []
for file in self.songs:print("Parsing %s" % file)try:midi = converter.parse(file)except IndexError as e:print(f"Could not parse {file}")print(e)continuenotes_to_parse = Nonetry:  s2 = instrument.partitionByInstrument(midi)notes_to_parse = s2.parts[0].recurse()except: notes_to_parse = midi.flat.notesprev_offset = 0.0for element in notes_to_parse:if isinstance(element, note.Note) or isinstance(element, chord.Chord):duration = element.duration.quarterLengthif isinstance(element, note.Note):name = element.pitchelif isinstance(element, chord.Chord):name = ".".join(str(n) for n in element.normalOrder)notes.append(f"{name}${duration}")rest_notes = int((element.offset - prev_offset) / TIMESTEP - 1)for _ in range(0, rest_notes):notes.append("NULL")prev_offset = element.offset
with open("notes/" + self.model_name, "wb") as filepath:pickle.dump(notes, filepath)

1.3 神经网络处理成序列

为了建立BLSTM网络，需要将数据处理成序列形式。

def prepare_sequences(self, notes, n_vocab):# 获取所有pitch 名称pitchnames = sorted(set(item for item in notes))# 创建一个字典来映射音高到整数note_to_int = dict((note, number + 1) for number, note in enumerate(pitchnames))note_to_int["NULL"] = 0network_input = []network_output = []for i in range(0, len(notes) - SEQUENCE_LEN, 1):sequence_in = notes[i : i + SEQUENCE_LEN]sequence_out = notes[i + SEQUENCE_LEN]network_input.append([note_to_int[char] for char in sequence_in])network_output.append(note_to_int[sequence_out])n_patterns = len(network_input)network_input = numpy.reshape(network_input, (n_patterns, SEQUENCE_LEN, 1))network_input = network_input / float(n_vocab)print(network_output)network_output = np_utils.to_categorical(network_output)return (network_input, network_output)

1.4 模型网络建立

通过在歌曲中某个特定位置建立之前和之后的音符，可以生成听起来与人类相似的旋律。通常，在听音乐时，之前发生的事情可以帮助听众预测接下来的音节。很多时候我一直在听一首歌，我可以随着特定的节奏跳动，因为我可以预测接下来会发生什么。这恰恰是在增加一首歌曲时发生的情况。比如这首歌变得越来越强烈，这使听众在预期落下时会产生紧张感，并在最终击打时产生那种放松和兴奋的时刻。通过利用这一点，我们能够产生听起来自然的节奏，并产生出我们已经习惯于现代音乐中期望的相同情感。

对于BLSTM层中的节点数，我们选择512。对于激活函数，我们选择softmax。对于损失函数，我们选择交叉熵，因为它们可以很好地解决诸如音符预测之类的多类分类问题。最后，我们选择RMSprop作为优化程序，这是Keras为RNN推荐的。

def train(self, network_input, network_output):""" train the neural network """filepath = (self.model_name + "-weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5")checkpoint = ModelCheckpoint(filepath, monitor="loss", verbose=0, save_best_only=True, mode="min")callbacks_list = [checkpoint]self.model.fit(network_input,network_output,epochs=self.epochs,batch_size=64,callbacks=callbacks_list,)
def create_network(network_input, n_vocab):print("Input shape ", network_input.shape)print("Output shape ", n_vocab)""" create the structure of the neural network """model = Sequential()model.add(Bidirectional(LSTM(512, return_sequences=True),input_shape=(network_input.shape[1], network_input.shape[2]),))model.add(Dropout(0.3))model.add(Bidirectional(LSTM(512)))model.add(Dense(n_vocab))model.add(Activation("softmax"))model.compile(loss="categorical_crossentropy", optimizer="rmsprop")return model

音乐生成

创作音乐最重要的部分之一就是结构。我们设定结构形式如下，我们从随机音符中生成了第一节音律，然后根据第一条音律生成了第二节音律。实际上，这将生成一个两倍的长度并将其分成两半的部分。这里的思考过程是，如果我们创作一首音乐，那么第二首音乐仍应符合相同的氛围，并且通过将第一首音乐作为参考，我们可以实现这一目标。

（1）根据音符序列从神经网络中生成音符

def get_start():# pick a random sequence from the input as a starting point for the predictionstart = numpy.random.randint(0, len(network_input) - 1)pattern = network_input[start]prediction_output = []return pattern, prediction_output
# generate verse 1
verse1_pattern, verse1_prediction_output = get_start()
for note_index in range(4 * SEQUENCE_LEN):prediction_input = numpy.reshape(verse1_pattern, (1, len(verse1_pattern), 1))prediction_input = prediction_input / float(n_vocab)prediction = model.predict(prediction_input, verbose=0)index = numpy.argmax(prediction)print("index", index)result = int_to_note[index]verse1_prediction_output.append(result)verse1_pattern.append(index)verse1_pattern = verse1_pattern[1 : len(verse1_pattern)]
# generate verse 2
verse2_pattern = verse1_pattern
verse2_prediction_output = []
for note_index in range(4 * SEQUENCE_LEN):prediction_input = numpy.reshape(verse2_pattern, (1, len(verse2_pattern), 1))prediction_input = prediction_input / float(n_vocab)prediction = model.predict(prediction_input, verbose=0)index = numpy.argmax(prediction)print("index", index)result = int_to_note[index]verse2_prediction_output.append(result)verse2_pattern.append(index)verse2_pattern = verse2_pattern[1 : len(verse2_pattern)]
# generate chorus
chorus_pattern, chorus_prediction_output = get_start()
for note_index in range(4 * SEQUENCE_LEN):prediction_input = numpy.reshape(chorus_pattern, (1, len(chorus_pattern), 1))prediction_input = prediction_input / float(n_vocab)prediction = model.predict(prediction_input, verbose=0)index = numpy.argmax(prediction)print("index", index)result = int_to_note[index]chorus_prediction_output.append(result)chorus_pattern.append(index)chorus_pattern = chorus_pattern[1 : len(chorus_pattern)]
# generate bridge
bridge_pattern, bridge_prediction_output = get_start()
for note_index in range(4 * SEQUENCE_LEN):prediction_input = numpy.reshape(bridge_pattern, (1, len(bridge_pattern), 1))prediction_input = prediction_input / float(n_vocab)prediction = model.predict(prediction_input, verbose=0)index = numpy.argmax(prediction)print("index", index)result = int_to_note[index]bridge_prediction_output.append(result)bridge_pattern.append(index)bridge_pattern = bridge_pattern[1 : len(bridge_pattern)]
return (verse1_prediction_output+ chorus_prediction_output+ verse2_prediction_output+ chorus_prediction_output+ bridge_prediction_output+ chorus_prediction_output
)

（2）将预测输出转换为notes，并从notes创建midi文件。根据模型生成的值创建note和chord对象。

for pattern in prediction_output:if "$" in pattern:pattern, dur = pattern.split("$")if "/" in dur:a, b = dur.split("/")dur = float(a) / float(b)else:dur = float(dur)# pattern is a chordif ("." in pattern) or pattern.isdigit():notes_in_chord = pattern.split(".")notes = []for current_note in notes_in_chord:new_note = note.Note(int(current_note))new_note.storedInstrument = instrument.Piano()notes.append(new_note)new_chord = chord.Chord(notes)new_chord.offset = offsetnew_chord.duration = duration.Duration(dur)output_notes.append(new_chord)# pattern is a restelif pattern is "NULL":offset += TIMESTEP# pattern is a noteelse:new_note = note.Note(pattern)new_note.offset = offsetnew_note.storedInstrument = instrument.Piano()new_note.duration = duration.Duration(dur)output_notes.append(new_note)# 增加每次迭代的偏移量，这样笔记就不会堆积offset += TIMESTEP
midi_stream = stream.Stream(output_notes)
output_file = os.path.basename(self.weights) + ".mid"
print("output to " + output_file)
midi_stream.write("midi", fp=output_file)

源码

完整代码下载链接：

https://pan.baidu.com/s/1uPflHi1u6Vl_J_L7Q_JFaA

提取码：8n1p

作者简介：李秋键，CSDN博客专家，CSDN达人课作者。硕士在读于中国矿业大学，开发有taptap竞赛获奖等。

2020-2021中国开发者调查报告重磅来袭，直接扫码或微信搜索「CSDN」公众号，后台回复关键词「开发者」，快速获取完整的报告内容！

更多精彩推荐

☞市值达 58 亿美元，吴恩达的在线教育平台 Coursera 正式上市☞英特尔第三代 Ice Lake 发布正面与 AMD EPYC PK，结果令人大跌眼镜！☞AR 第一大单，微软 219 亿美元为美军打造高科技头盔

点分享点收藏点点赞点在看

https://www.dkcj.cn/info/27183.html

打造专属BGM，Python 深度学习教你

相关文章：

XML 特殊字符处理和 CDATA

zookeeper集群环境搭建

ASP.NET遍历配置文件的连接字符串

#define WIN32_LEAN_AND_MEAN 的作用

《头号玩家》中的“绿洲”，用 VR 可以找到

Android开发之程序猿必需要懂得Android的重要设计理念2（5.20更新版）

Espresso小试

XML与DataSet的相互转换类

想学Python？那这套教程再适合不过了！

修改360浏览器标题栏显示的文字

联邦学习的隐忧：来自梯度的深度泄露

.net 中 using的几种用法

少走弯路的10条忠告

linux实战考试题：批量创建用户和密码（不能使用循环）

路径，文件，目录，I/O常见操作汇总

Winform开发的界面处理优化

人工智能语音技术支持“多情感程度”调节，细腻演绎“人声”

HDU 1431 素数回文

递归的妙用—遍历子控件

【原创】关于代码质量的打油诗

Java 开发技巧详细知识体系总结

23-hadoop-hive的DDL和DML操作

经典正则表达式

腾讯云TDSQL数据库核心技术理论取得进展，同时发布数据异常检测工具

Android应用工程文件组成

matlab2014a + win764bit + vs2013混合编程（.m转成dll供C++调用）

当前日期得到本周的开始和结束日期

分享一个mysql 复杂查询的例子

百度携手同济大学，瞄准AI、智慧交通等核心科技领域攻关

怎样做才是最优雅方式切换 web 项目数据源？