Julia - 字典和集合
到目前为止,我们看到的许多函数都适用于数组和元组。数组只是一种类型的集合,但 Julia 也有其他类型的集合。其中一种集合是将键与值关联起来的 Dictionary 对象。这就是为什么它被称为“关联集合”。
为了更好地理解它,我们可以将其与简单的查找表进行比较,在查找表中组织了多种类型的数据,并为我们提供了单一的信息,例如数字、字符串或符号,称为键。它并没有为我们提供相应的数据值。
创建字典
创建简单字典的语法如下 -
Dict(“key1” => value1, “key2” => value2,,…, “keyn” => valuen)
在上面的语法中,key1,key2…keyn 是键,value1,value2,…valuen 是对应的值。运算符 => 是 Pair() 函数。我们不能有两个同名的键,因为键在字典中总是唯一的。
例子
julia> first_dict = Dict("X" => 100, "Y" => 110, "Z" => 220)
Dict{String,Int64} with 3 entries:
"Y" => 110
"Z" => 220
"X" => 100
我们还可以借助理解语法来创建字典。下面给出了示例 -
例子
julia> first_dict = Dict(string(x) => sind(x) for x = 0:5:360)
Dict{String,Float64} with 73 entries:
"320" => -0.642788
"65" => 0.906308
"155" => 0.422618
"335" => -0.422618
"75" => 0.965926
"50" => 0.766044
⋮ => ⋮
按键
如前所述,字典具有唯一的键。这意味着,如果我们为已经存在的键分配一个值,我们不会创建新的键,而是修改现有的键。以下是有关键的字典的一些操作 -
寻找钥匙
我们可以使用haskey()函数来检查字典是否包含键 -
julia> first_dict = Dict("X" => 100, "Y" => 110, "Z" => 220)
Dict{String,Int64} with 3 entries:
"Y" => 110
"Z" => 220
"X" => 100
julia> haskey(first_dict, "Z")
true
julia> haskey(first_dict, "A")
false
搜索键/值对
我们可以使用in()函数来检查字典是否包含键/值对 -
julia> in(("X" => 100), first_dict)
true
julia> in(("X" => 220), first_dict)
false
添加新的键值对
我们可以在现有字典中添加新的键值,如下所示 -
julia> first_dict["R"] = 400
400
julia> first_dict
Dict{String,Int64} with 4 entries:
"Y" => 110
"Z" => 220
"X" => 100
"R" => 400
删除一个键
我们可以使用delete!()函数从现有字典中删除键 -
julia> delete!(first_dict, "R")
Dict{String,Int64} with 3 entries:
"Y" => 110
"Z" => 220
"X" => 100
获取所有钥匙
我们可以使用keys()函数从现有字典中获取所有键 -
julia> keys(first_dict)
Base.KeySet for a Dict{String,Int64} with 3 entries. Keys:
"Y"
"Z"
"X"
价值观
字典中的每个键都有对应的值。以下是有关值的字典的一些操作 -
检索所有值
我们可以使用values()函数从现有字典中获取所有值 -
julia> values(first_dict)
Base.ValueIterator for a Dict{String,Int64} with 3 entries. Values:
110
220
100
字典作为可迭代对象
我们可以处理每个键/值对以查看字典实际上是可迭代对象 -
for kv in first_dict
println(kv)
end
"Y" => 110
"Z" => 220
"X" => 100
这里的kv是一个包含每个键/值对的元组。
对字典进行排序
字典不以任何特定顺序存储键,因此字典的输出不会是排序数组。为了按顺序获取项目,我们可以对字典进行排序 -
例子
julia> first_dict = Dict("R" => 100, "S" => 220, "T" => 350, "U" => 400, "V" => 575, "W" => 670)
Dict{String,Int64} with 6 entries:
"S" => 220
"U" => 400
"T" => 350
"W" => 670
"V" => 575
"R" => 100
julia> for key in sort(collect(keys(first_dict)))
println("$key => $(first_dict[key])")
end
R => 100
S => 220
T => 350
U => 400
V => 575
W => 670
我们还可以使用DataStructures.ji Julia 包中的SortedDict数据类型来确保字典始终保持排序状态。您可以检查下面的示例 -
例子
julia> import DataStructures
julia> first_dict = DataStructures.SortedDict("S" => 220, "T" => 350, "U" => 400, "V" => 575, "W" => 670)
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 5 entries:
"S" => 220
"T" => 350
"U" => 400
"V" => 575
"W" => 670
julia> first_dict["R"] = 100
100
julia> first_dict
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 6 entries:
“R” => 100
“S” => 220
“T” => 350
“U” => 400
“V” => 575
“W” => 670
字数统计示例
字典的简单应用之一是计算每个单词在文本中出现的次数。该应用程序背后的概念是,每个单词都是一个键值集,该键的值是特定单词在该文本片段中出现的次数。
在下面的示例中,我们将计算文件名 NLP.txtb(保存在桌面上)中的单词数 -
julia> f = open("C://Users//Leekha//Desktop//NLP.txt")
IOStream()
julia> wordlist = String[]
String[]
julia> for line in eachline(f)
words = split(line, r"\W")
map(w -> push!(wordlist, lowercase(w)), words)
end
julia> filter!(!isempty, wordlist)
984-element Array{String,1}:
"natural"
"language"
"processing"
"semantic"
"analysis"
"introduction"
"to"
"semantic"
"analysis"
"the"
"purpose"
……………………
……………………
julia> close(f)
从上面的输出中我们可以看到,wordlist 现在是一个包含 984 个元素的数组。
我们可以创建一个字典来存储单词和字数 -
julia> wordcounts = Dict{String,Int64}()
Dict{String,Int64}()
julia> for word in wordlist
wordcounts[word]=get(wordcounts, word, 0) + 1
end
要找出单词出现的次数,我们可以在字典中查找单词,如下所示 -
julia> wordcounts["natural"] 1 julia> wordcounts["processing"] 1 julia> wordcounts["and"] 14
我们还可以按如下方式对字典进行排序 -
julia> for i in sort(collect(keys(wordcounts)))
println("$i, $(wordcounts[i])")
end
1, 2
2, 2
3, 2
4, 2
5, 1
a, 28
about, 3
above, 2
act, 1
affixes, 3
all, 2
also, 5
an, 5
analysis, 15
analyze, 1
analyzed, 1
analyzer, 2
and, 14
answer, 5
antonymies, 1
antonymy, 1
application, 3
are, 11
…
…
…
…
为了找到最常见的单词,我们可以使用collect()将字典转换为元组数组,然后按如下方式对数组进行排序 -
julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)
276-element Array{Pair{String,Int64},1}:
"the" => 76
"of" => 47
"is" => 39
"a" => 28
"words" => 23
"meaning" => 23
"semantic" => 22
"lexical" => 21
"analysis" => 15
"and" => 14
"in" => 14
"be" => 13
"it" => 13
"example" => 13
"or" => 12
"word" => 12
"for" => 11
"are" => 11
"between" => 11
"as" => 11
⋮
"each" => 1
"river" => 1
"homonym" => 1
"classification" => 1
"analyze" => 1
"nocturnal" => 1
"axis" => 1
"concept" => 1
"deals" => 1
"larger" => 1
"destiny" => 1
"what" => 1
"reservation" => 1
"characterization" => 1
"second" => 1
"certitude" => 1
"into" => 1
"compound" => 1
"introduction" => 1
我们可以检查前 10 个单词,如下所示 -
julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)[1:10]
10-element Array{Pair{String,Int64},1}:
"the" => 76
"of" => 47
"is" => 39
"a" => 28
"words" => 23
"meaning" => 23
"semantic" => 22
"lexical" => 21
"analysis" => 15
"and" => 14
我们可以使用filter()函数来查找以特定字母表(例如“n”)开头的所有单词。
julia> filter(tuple -> startswith(first(tuple), "n") && last(tuple) < 4, collect(wordcounts))
6-element Array{Pair{String,Int64},1}:
"none" => 2
"not" => 3
"namely" => 1
"name" => 1
"natural" => 1
"nocturnal" => 1
套
与数组或字典一样,集合可以定义为唯一元素的集合。以下是集合和其他类型集合之间的区别 -
在集合中,每个元素只能有一个。
集合中元素的顺序并不重要。
创建一个集合
在Set构造函数的帮助下,我们可以创建一个集合,如下所示 -
julia> var_color = Set()
Set{Any}()
我们还可以指定集合的类型,如下所示 -
julia> num_primes = Set{Int64}()
Set{Int64}()
我们还可以创建并填充集合,如下所示 -
julia> var_color = Set{String}(["red","green","blue"])
Set{String} with 3 elements:
"blue"
"green"
"red"
或者,我们也可以使用push!()函数作为数组,在集合中添加元素,如下所示 -
julia> push!(var_color, "black")
Set{String} with 4 elements:
"blue"
"green"
"black"
"red"
我们可以使用in()函数来检查集合中有什么 -
julia> in("red", var_color)
true
julia> in("yellow", var_color)
false
标准操作
并集、交集和差集是我们可以对集合执行的一些标准运算。这些操作对应的函数是union()、intersect()和setdiff()。
联盟
一般来说,联合(集合)运算返回两个语句的组合结果。
例子
julia> color_rainbow = Set(["red","orange","yellow","green","blue","indigo","violet"])
Set{String} with 7 elements:
"indigo"
"yellow"
"orange"
"blue"
"violet"
"green"
"red"
julia> union(var_color, color_rainbow)
Set{String} with 8 elements:
"indigo"
"yellow"
"orange"
"blue"
"violet"
"green"
"black"
"red"
路口
通常,交集运算将两个或多个变量作为输入并返回它们之间的交集。
例子
julia> intersect(var_color, color_rainbow)
Set{String} with 3 elements:
"blue"
"green"
"red"
不同之处
一般来说,差分运算需要两个或多个变量作为输入。然后,它返回第一组的值,不包括与第二组重叠的值。
例子
julia> setdiff(var_color, color_rainbow)
Set{String} with 1 element:
"black"
字典的一些功能
在下面的示例中,您将看到适用于数组和集合的函数也适用于字典等集合 -
julia> dict1 = Dict(100=>"X", 220 => "Y")
Dict{Int64,String} with 2 entries:
100 => "X"
220 => "Y"
julia> dict2 = Dict(220 => "Y", 300 => "Z", 450 => "W")
Dict{Int64,String} with 3 entries:
450 => "W"
220 => "Y"
300 => "Z"
联盟
julia> union(dict1, dict2)
4-element Array{Pair{Int64,String},1}:
100 => "X"
220 => "Y"
450 => "W"
300 => "Z"
相交
julia> intersect(dict1, dict2)
1-element Array{Pair{Int64,String},1}:
220 => "Y"
不同之处
julia> setdiff(dict1, dict2)
1-element Array{Pair{Int64,String},1}:
100 => "X"
合并两个字典
julia> merge(dict1, dict2)
Dict{Int64,String} with 4 entries:
100 => "X"
450 => "W"
220 => "Y"
300 => "Z"
寻找最小元素
julia> dict1
Dict{Int64,String} with 2 entries:
100 => "X"
220 => "Y"
julia> findmin(dict1)
("X", 100)