在通过C#(Mono)进行RSS信息的解析时,遇到了以下的错误:
1 2 3 4 5 6 7 8 9 10 11 12 |
System.FormatException: String was not recognized as a valid DateTime. at System.DateTimeParse.ParseExactMultiple (System.String s, System.String[] formats, System.Globalization.DateTimeFormatInfo dtfi, System.Globalization.DateTimeStyles style, System.TimeSpan& offset) [0x00053] in <638b7550331b4aebbce60a36c5915ef9>:0 at System.DateTimeOffset.ParseExact (System.String input, System.String[] formats, System.IFormatProvider formatProvider, System.Globalization.DateTimeStyles styles) [0x00015] in <638b7550331b4aebbce60a36c5915ef9>:0 at System.Xml.XmlConvert.ToDateTimeOffset (System.String s, System.String[] formats) [0x00015] in <e6894bb2955c4086a1f1fb894dfe9ec5>:0 at System.ServiceModel.Syndication.Rss20ItemFormatter.FromRFC822DateString (System.String s) [0x00000] in <a0375ca868e54fb1905bc5b820910218>:0 at System.ServiceModel.Syndication.Rss20ItemFormatter.ReadXml (System.Xml.XmlReader reader, System.Boolean fromSerializable) [0x0034c] in <a0375ca868e54fb1905bc5b820910218>:0 at System.ServiceModel.Syndication.Rss20ItemFormatter.ReadFrom (System.Xml.XmlReader reader) [0x00025] in <a0375ca868e54fb1905bc5b820910218>:0 at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItem (System.Xml.XmlReader reader, System.ServiceModel.Syndication.SyndicationFeed feed) [0x00005] in <a0375ca868e54fb1905bc5b820910218>:0 at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadXml (System.Xml.XmlReader reader, System.Boolean fromSerializable) [0x003f8] in <a0375ca868e54fb1905bc5b820910218>:0 at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFrom (System.Xml.XmlReader reader) [0x00025] in <a0375ca868e54fb1905bc5b820910218>:0 at MyProj.MiscObserver.RssObserver.GetFeed (System.String uri, System.String timeFormat) [0x0001d] in <3bbcb6297b7043ecbfa704e1f9b35b12>:0 at MyProj.MiscObserver.RssObserver.Run () [0x00086] in <3bbcb6297b7043ecbfa704e1f9b35b12>:0 |
而解析的Rss2.0文件内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
... <item> <title> <![CDATA[ ... ]]> </title> <description> <![CDATA[ ... ]]> </description> <pubDate>Mon, 21 Aug 2017 08:38:00 GMT</pubDate> <guid>Something Interesting</guid> <link>Something Interesting</link> </item> ... |
从上面异常相关的信息,显然是对pubDate的解析问题。这样的话只要在读入的时候将pubDate替换成可以被Mono解析的格式就可以了。
因为Rss20Formatter是从XmlReader中获取消息的,那么我们就可以对XmlReader的代码进行override,加入对应的替换代码。
这里 是一个override的代码,内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
class MyXmlReader : XmlTextReader { private bool readingDate = false; const string CustomUtcDateTimeFormat = "ddd MMM dd HH:mm:ss Z yyyy"; // Wed Oct 07 08:00:07 GMT 2009 public MyXmlReader(Stream s) : base(s) { } public MyXmlReader(string inputUri) : base(inputUri) { } public override void ReadStartElement() { if (string.Equals(base.NamespaceURI, string.Empty, StringComparison.InvariantCultureIgnoreCase) && (string.Equals(base.LocalName, "lastBuildDate", StringComparison.InvariantCultureIgnoreCase) || string.Equals(base.LocalName, "pubDate", StringComparison.InvariantCultureIgnoreCase))) { readingDate = true; } base.ReadStartElement(); } public override void ReadEndElement() { if (readingDate) { readingDate = false; } base.ReadEndElement(); } public override string ReadString() { if (readingDate) { string dateString = base.ReadString(); DateTime dt; if(!DateTime.TryParse(dateString,out dt)) dt = DateTime.ParseExact(dateString, CustomUtcDateTimeFormat, CultureInfo.InvariantCulture); return dt.ToUniversalTime().ToString("r", CultureInfo.InvariantCulture); } else { return base.ReadString(); } } } |
读取时构造MyXmlReader即可,上面的ToString(“r”)代表转换为 RFC822 格式的日期描述,是Mono可以接受的。
不过实际尝试过程中,上面的代码是无法奏效的。
查看 Mono中对应部分的源代码 ,发现并没有用上面的方法进行内容获取,而是使用了 XmlReader.ReadElementContentAsString() 进行获取,于是对此函数进行override。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
//Mono中部分源代码 ... case "pubDate": Item.PublishDate = FromRFC822DateString (reader.ReadElementContentAsString ()); continue; ... ... string [] rfc822formats = new string [] { "ddd, dd MMM yyyy HH:mm:ss 'Z'", "ddd, dd MMM yyyy HH:mm:ss zzz", "ddd, dd MMM yyyy HH:mm:ss"}; // FIXME: DateTimeOffset is still incomplete. When it is done, // simplify the code. DateTimeOffset FromRFC822DateString (string s) { return XmlConvert.ToDateTimeOffset (s, rfc822formats); } ... |
然而依然无法避免运行时错误,观察底部 FromRFC822DateString 函数的定义,可以发现实际解析的格式只有三种,并不是严格的 RFC822,将代码解析成为上述格式即可解决。最后附上代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
class MyXmlReader : XmlTextReader { private readonly string _customUtcDateTimeFormat = "ddd, dd MMM yyyy HH:MM:SS GMT"; // Wed Oct 07 08:00:07 GMT 2009 public MyXmlReader(Stream s) : base(s) { } public MyXmlReader(string inputUri, string customTimeFormat = "") : base(inputUri) { _customUtcDateTimeFormat = customTimeFormat; } public override string ReadElementContentAsString() { string nodeName = this.LocalName; string data = base.ReadElementContentAsString(); if (nodeName == "pubDate") { DateTime dt; if (!DateTime.TryParse(data, out dt)) dt = DateTime.ParseExact(data, _customUtcDateTimeFormat, CultureInfo.InvariantCulture); string result = dt.ToUniversalTime().ToString("ddd, dd MMM yyyy HH:mm:ss 'Z'", CultureInfo.InvariantCulture); //实际做了这种事情:Sun, 20 Aug 2017 05:12:00 GMT => Sun, 20 Aug 2017 05:12:00 +08:00 (当转换为ddd, dd MMM yyyy HH:mm:ss zzz) //实际做了这种事情:Sun, 20 Aug 2017 04:01:00 GMT => Sun, 20 Aug 2017 04:01:00 Z(当转换为ddd, dd MMM yyyy HH:mm:ss 'Z') return result; } return data; } } |
PS1:时隔快两个月了…?感觉这两个月都只是在增加经验,实际上搜索引擎中无法高效找到的问题,也只有上面那个了…
PS2:接下来就要认真复习考研啦…..x
PS3:f7(eiki)
PS4:试试自己能不能弄一个pull request呢…?
追记:弄了pull request,已merge