C# 2.0 中Iterators的改进与实现原理浅析-.NET教程,C#语言-IDC资讯中心

c#语言从vb中吸取了一个非常实用的foreach语句。对所有支持ienumerable接口的类的实例，foreach语句使用统一的接口遍历其子项，使得以前冗长的for循环中繁琐的薄记工作完全由编译器自动完成。支持ienumerable接口的类通常用一个内嵌类实现ienumerator接口，并通过ienumerable.getenumerator函数，允许类的使用者如foreach语句完成遍历工作。

这一特性使用起来非常方便，但需要付出一定的代价。juval lowy发表在msdn杂志2004年第5期上的create elegant code with anonymous methods, iterators, and partial classes一文中，较为详细地介绍了c# 2.0中迭代支持和其他新特性。

首先，因为ienumerator.current属性是一个object类型的值，所以值类型(value type)集合在被foreach语句遍历时，每个值都必须经历一次无用的box和unbox操作；就算是引用类型(reference type)集合，在被foreach语句使用时，也需要有一个冗余的castclass指令，保障枚举出来的值进行类型转换的正确性。

以下为引用：

using system.collections;

public class tokens : ienumerable

{

…

tokens f = new tokens(…);

foreach (string item in f)

{

console.writeline(item);

}

…

}

上面的简单代码被自动转换为

以下为引用：

tokens f = new tokens(…);

ienumerator enum = f.getenumerator();

try

{

do {

string item = (string)enum.get_current(); // 冗余转换

console.writeline(item);

} while(enum.movenext());

}

finally

{

if(enum is idisposable) // 需要验证实现ienumerator接口的类是否支持idisposable接口

{

((idisposable)enum).dispose();

}

}

好在c# 2.0中支持了泛型(generic)的概念，提供了强类型的泛型版本ienumerable定义，伪代码如下：

以下为引用：

namespace system.collections.generic

{

public interface ienumerable<itemtype>

{

ienumerator<itemtype> getenumerator();

}

public interface ienumerator<itemtype> : idisposable

{

itemtype current{get;}

bool movenext();

}

}

这样一来即保障了遍历集合时的类型安全，又能够对集合的实际类型直接进行操作，避免冗余转换，提高了效率。

以下为引用：

using system.collections.generic;

public class tokens : ienumerable<string>

{

… // 实现 ienumerable<string> 接口

tokens f = new tokens(…);

foreach (string item in f)

{

console.writeline(item);

}

}

上面的代码被自动转换为

以下为引用：

tokens f = new tokens(…);

ienumerator<string> enum = f.getenumerator();

try

{

do {

string item = enum.get_current(); // 无需转换

console.writeline(item);

} while(enum.movenext());

}

finally

{

if(enum) // 无需验证实现ienumerator接口的类是否支持idisposable接口，

// 因为所有由编译器自动生成的ienumerator接口实现类都支持

{

((idisposable)enum).dispose();

}

}

除了遍历时的冗余转换降低性能外，c#现有版本另一个不爽之处在于实现ienumerator接口实在太麻烦了。通常都是由一个内嵌类实现ienumerator接口，而此内嵌类除了get_current()函数外，其他部分的功能基本上都是相同的，如

以下为引用：

public class tokens : ienumerable

{

public string[] elements;

tokens(string source, char[] delimiters)

{

// parse the string into tokens:

elements = source.split(delimiters);

}

public ienumerator getenumerator()

{

return new tokenenumerator(this);

}

// inner class implements ienumerator interface:

private class tokenenumerator : ienumerator

{

private int position = -1;

private tokens t;

public tokenenumerator(tokens t)

{

this.t = t;

}

// declare the movenext method required by ienumerator:

public bool movenext()

{

if (position < t.elements.length – 1)

{

position++;

return true;

}

else

{

return false;

}

}

// declare the reset method required by ienumerator:

public void reset()

{

position = -1;

}

// declare the current property required by ienumerator:

public object current

{

get // get_current函数

{

return t.elements[position];

}

}

}

…

}

内嵌类tokenenumerator的position和tokens实际上是每个实现ienumerator接口的类共有的，只是current属性的get函数有所区别而已。这方面c# 2.0做了很大的改进，增加了yield关键字的支持，允许代码逻辑上的重用。上面冗长的代码在c# 2.0中只需要几行，如

以下为引用：

using system.collections.generic;

public class tokens : ienumerable<string>

{

public ienumerator<string> getenumerator()

{

for(int i = 0; i<elements.length; i++)

yield elements[i];

}

…

}

getenumerator函数是一个c# 2.0支持的迭代块(iterator block)，通过yield告诉编译器在什么时候返回什么值，再由编译器自动完成实现ienumerator<string>接口的薄记工作。而yield break语句支持从迭代块中直接结束，如

以下为引用：

public ienumerator<int> getenumerator()

{

for(int i = 1;i< 5;i++)

{

yield return i;

if(i > 2)

yield break; // i > 2 时结束遍历

}

}

这样一来，很容易就能实现ienumerator接口，并可以方便地支持在一个类中提供多种枚举方式，如

以下为引用：

public class citycollection

{

string[] m_cities = {"new york","paris","london"};

public ienumerable<string> reverse

{

get

{

for(int i=m_cities.length-1; i>= 0; i–)

yield m_cities[i];

}

}

}

接下来我们看看如此方便的语言特性背后，编译器为我们做了哪些工作。以上面那个支持ienumerable<string>接口的tokens类为例，getenumerator函数的代码被编译器用一个类包装起来，伪代码如下

以下为引用：

public class tokens : ienumerable<string>

{

private sealed class getenumerator$00000000__ienumeratorimpl

: ienumerator<string>, ienumerator, idisposable

{

private int $pc = 0;

private string $_current;

private tokens <this>;

public int i$00000001 = 0;

// 实现 ienumerator<string> 接口

string ienumerator<string>.get_current()

{

return $_current;

}

bool ienumerator<string>.movenext()

{

switch($pc)

{

case 0:

{

$pc = -1;

i$00000001 = 0;

break;

}

case 1:

{

$pc = -1;

i$00000001++;

break;

}

default:

{

return false;

}

}

if(i$00000001 < <this>.elements.length)

{

$_current = <this>.elements[i$00000001];

$pc = 1;

return true;

}

else

{

return false;

}

}

// 实现 ienumerator 接口

void ienumerator.reset()

{

throw new exception();

}

string ienumerator.get_current()

{

return $_current;

}

bool ienumerator.movenext()

{

return ienumerator<string>.movenext(); // 调用 ienumerator<string> 接口的实现

}

// 实现 idisposable 接口

void dispose()

{

}

}

public ienumerator<string> getenumerator()

{

getenumerator$00000000__ienumeratorimpl impl = new getenumerator$00000000__ienumeratorimpl();

impl.<this> = this;

return impl;

}

}

从上面的伪代码中我们可以看到，c# 2.0编译器实际上维护了一个和我们前面实现ienumerator接口的tokenenumerator类非常类似的内部类，用来封装ienumerator<string>接口的实现。而这个内嵌类的实现逻辑，则根据getenumerator定义的yield返回地点决定。

我们接下来看一个较为复杂的迭代块的实现，支持递归迭代(recursive iterations)，代码如下：

以下为引用：

using system;

using system.collections.generic;

class node<t>

{

public node<t> leftnode;

public node<t> rightnode;

public t item;

}

public class binarytree<t>

{

node<t> m_root;

public void add(params t[] items)

{

foreach(t item in items)

add(item);

}

public void add(t item)

{

// …

}

public ienumerable<t> inorder

{

get

{

return scaninorder(m_root);

}

}

ienumerable<t> scaninorder(node<t> root)

{

if(root.leftnode != null)

{

foreach(t item in scaninorder(root.leftnode))

{

yield item;

}

}

yield root.item;

if(root.rightnode != null)

{

foreach(t item in scaninorder(root.rightnode))

{

yield item;

}

}

}

}

binarytree<t>提供了一个支持ienumerable<t>接口的inorder属性，通过scaninorder函数遍历整个二叉树。因为实现ienumerable<t>接口的不是类本身，而是一个属性，所以编译器首先要生成一个内嵌类支持ienumerable<t>接口。伪代码如下

以下为引用：

public class binarytree<t>

{

private sealed class scaninorder$00000000__ienumeratorimpl<t>

: ienumerator<t>, ienumerator, idisposable

{

binarytree<t> <this>;

node<t> root;

// …

}

private sealed class scaninorder$00000000__ienumerableimpl<t>

: ienumerable<t>, ienumerable

{

binarytree<t> <this>;

node<t> root;

ienumerator<t> ienumerable<t>.getenumerator()

{

scaninorder$00000000__ienumeratorimpl<t> impl = new scaninorder$00000000__ienumeratorimpl<t>();

impl.<this> = this.<this>;

impl.root = this.root;

return impl;

}

ienumerator ienumerable.getenumerator()

{

scaninorder$00000000__ienumeratorimpl<t> impl = new scaninorder$00000000__ienumeratorimpl<t>();

impl.<this> = this.<this>;

impl.root = this.root;

return impl;

}

}

ienumerable<t> scaninorder(node<t> root)

{

scaninorder$00000000__ienumerableimpl<t> impl = new scaninorder$00000000__ienumerableimpl<t>();

impl.<this> = this;

impl.root = root;

return impl;

}

}

因为scaninorder函数内容需要用到root参数，故而ienumerable<t>和ienumerator<t>接口的包装类都需要有一个root字段，保存传入scaninorder函数的参数，并传递给最终的实现函数。

实现ienumerator<t>接口的内嵌包装类scaninorder$00000000__ienumeratorimpl<t>实现原理与前面例子里的大致相同，不同的是程序逻辑大大复杂化，并且需要用到idisposable接口完成资源的回收。

以下为引用：

public class binarytree<t>

{

private sealed class getenumerator$00000000__ienumeratorimpl

: ienumerator<t>, ienumerator, idisposable

{

private int $pc = 0;

private string $_current;

private tokens <this>;

public int i$00000001 = 0;

public ienumerator<t> __wrap$00000003;

public ienumerator<t> __wrap$00000004;

public t item$00000001;

public t item$00000002;

public node<t> root;

// 实现 ienumerator<t> 接口

string ienumerator<t>.get_current()

{

return $_current;

}

bool ienumerator<t>.movenext()

{

switch($pc)

{

case 0:

{

$pc = -1;

if(root.leftnode != null)

{

__wrap$00000003 = <this>.scaninorder(root.leftnode).getenumerator();

goto scanleft;

}

else

{

goto getitem;

}

}

case 1:

{

return false;

}

case 2:

{

goto scanleft;

}

case 3:

{

$pc = -1;

if(root.rightnode != null)

{

__wrap$00000004 = <this>.scaninorder(root.rightnode).getenumerator();

goto scanright;

}

else

{

return false;

}

break;

}

case 4:

{

return false;

}

case 5:

{

goto scanright;

}

default:

{

return false;

}

scanleft:

$pc = 1;

if(__wrap$00000003.movenext())

{

$_current = item$00000001 = __wrap$00000003.get_current();

$pc = 2;

return true;

}

getitem:

$pc = -1;

if(__wrap$00000003 != null)

{

((idisposable)__wrap$00000003).dispose();

}

$_current = root.item;

$pc = 3;

return true;

scanright:

$pc = 4;

if(__wrap$00000004.movenext())

{

$_current = $item$00000002 = __wrap$00000004.get_current();

$pc = 5;

return true;

}

else

{

$pc = -1;

if(__wrap$00000004 != null)

{

((idisposable)__wrap$00000004).dispose();

}

return false;

}

}

// 实现 idisposable 接口

void dispose()

{

switch($pc)

{

case 1:

case 2:

{

$pc = -1;

if(__wrap$00000003 != null)

{

((idisposable)__wrap$00000003).dispose();

}

break;

}

case 4:

case 5:

{

$pc = -1;

if(__wrap$00000004 != null)

{

((idisposable)__wrap$00000004).dispose();

}

break;

}

}

}

}

}

通过上面的伪代码，我们可以看到，c# 2.0实际上是通过一个以$pc为自变量的有限状态机完成的递归迭代块，这可能是因为有限状态机可以很方便地通过程序自动生成吧。而dispose()函数则负责处理状态机的中间变量。

有兴趣进一步了解迭代特性的朋友，可以到grant ri的blog上阅读iterators相关文章。

在了解了iterators的实现原理后，再看那些讨论就不会被其表象所迷惑了 :d

C# 2.0 中Iterators的改进与实现原理浅析-.NET教程,C#语言

相关推荐

热门标签

热门文章

分类目录