Easy way to extract urls out of a podcast rss feed?

Heyas.

I'm writing an app for a podcast I'm a host in.

I'd like to point the app to the RSS feed of the podcast and rip out any url from the RSS feed thus:
<enclosure type="audio/mpeg" url="http://techwebcast.podomatic.com/enclosure/2011-08-07T01_00_52-07_00.mp3" length="9218958"/>

Then try to play it inside Corona somehow.
The URL for the podcast RSS feed is:
http://techwebcast.podomatic.com/rss2.xml
Is there an easy way to download the xml and parse out just the mp3 without going into difficult XML parsing?

I'd ideally like to make a UI table of buttons to press to play each episode of the show.

Cheers! :)

if you want to list all the episodes and play it separately then you have to parse the XML document. let me see whether I can find you some simple code for that.

In the Corona SDK blog, there was a recent post about parsing XML files and there are exiting libraries that make it reasonably easy.

In fact I'm building an App for my older son's Music Blog and I've got it to a point where I can read the entries with podcasts and extract out the URL. I'm now stuck (translate to: being lazy) on creating a player to play it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
local xml = require( "xml" ).newParser()
 
local stories = {}
 
local networkListener = function( event )
        local feed
        local story = {}
        if ( event.isError ) then
                print ( "Network error - download failed" )
                story.title = "Network Error"
                story.link = nil
                stories[1] = story
        else
                print("Parsing the feed")
                feed = xml:loadFile("index.rss", system.TemporaryDirectory)
                local items = feed.child[1].child
                local i
                local l = 1
                for i = 1, #items do
                        local item = items[i]
                        if item.name == "item" then -- we have a story batman!
                                local j
                                for j = 1, #item.child do
                                        if item.child[j].name == "title" then
                                                story.title = item.child[j].value
                                        end
                                        if item.child[j].name == "link" then
                                                story.link = item.child[j].value
                                        end
                                        if item.child[j].name == "pubDate" then
                                                story.pubDate = item.child[j].value
                                        end
                                        if item.child[j].name == "content:encoded" then
                                                -- get the story body
                                                bodytag = {}
                                                bodytag = item.child[j].child
                                                utility.print_r(bodytag)
                                                local p;
                                                story.body = ""
                                                for p = 1, #bodytag do
                                                        if (bodytag[p].value) then
                                                                story.body = story.body .. bodytag[p].value .. "\n\n"
                                                        end
                                                end
                                        end
                                        if item.child[j].name == "enclosure" then
                                                local properties = {}
                                                properties = item.child[j].properties
                                                story.podcastURL = properties.url
                                                story.podcastSize = properties.length
                                                story.podcastType = properties.type
                                        end
                                end
                                stories[l] = {}
                                stories[l].link = story.link
                                stories[l].title = story.title
                                stories[l].body = story.body
                                stories[l].podcastURL = story.podcastURL
                                stories[l].podcastSize = story.podcastSize
                                stories[l].podcastType = story.podcastType
                                -- print("[[" .. story.body .. "]]")
                                l = l + 1
                        end
                end
        end
end
 
network.download("http://yoursite.com/feed/", "GET", networkListener, "index.rss", system.TemporaryDirectory )

Thanks for the help!

@rob

How do you display the rss feed, text or in a tableView -> slideView?

I'm using a widget.tableView, then I made my own screen to show the individual story.

I was just a few days ago thinking of trying this on my wordpress website, wanted to check this out for some time now but thought xml was too messy to work with after I played with json.

I think a sample like this should ship with Corona.

Yea, once you get used to JSON XML is a pain in the pattotie. But RSS is XML, and that's what wordpress gives us for free, so...

I probably should release the whole project too. Probably more beneficial to the community as source than it is any form of IP.

That'd be SO AWESOME! :)

Okay. I fiddled around using Jon's tutorial and managed to PRINT the correct field values to the terminal.
Next, I'll get the lines of code above to download the RSS XML file to the temp dir to process.
After that is creating buttons to click to play them.
Coming along nicely.
Might grab some process from above to clean mine up a bit tho.

This could be a problem, however:

https://www.google.com/calendar/feeds/techwebcast%40gmail.com/public/basic

No file.xml to download. Hmm.

Oh! This might help. Will test later tonight:

destFilename

Well I did it. After I submitted the app to Apple, I added the source (Lua) files to the github repository (after making it generic).

No laughing at my code!!!!

https://github.com/robmiracle/rss.lua

I didn't provide any artwork. For the tab bar buttons, find it in code exchange and get your buttons from there. You can make your own bullet for your tableView controllers, provide your own fonts and such.

Now if you want me to do everything for you, you can paypal me some cash-oly! A guy's got to eat!

:-)

Enjoy
Rob

I downloaded your code, got all images and tried it with the Ansca blog rrs feed but I got this error;

1
2
3
4
5
6
7
8
9
10
11
ERROR: The resource file () could not be found at case-sensitive path (/var/folders/l0/td5hb8v560j8ncjymmw0btj00000gn/T/TemporaryItems/191/).
WARNING: Failed to find image()
Runtime error
        ?:0: attempt to index a nil value
stack traceback:
        [C]: ?
        ?: in function <?:4683>
        ?: in function 'renderItem'
        ?: in function 'sync'
        ...8ncjymmw0btj00000gn/T/TemporaryItems/191/screen1.lua:92: in function 'processRSSFeed'
        ...8ncjymmw0btj00000gn/T/TemporaryItems/191/screen1.lua:105: in function <...8ncjymmw0btj00000gn/T/TemporaryItems/191/screen1.lua:99>

Did you download the widget library files?

http://developer.anscamobile.com/content/widget

the zip file listed on that page needs to be unpacked in your project folder. I ran into an error and I ended up with:

1
2
3
4
5
6
7
+-->My Project/
       main.lua 
       widget_ios/
           tableView/
           uiButton/
       tableView/
       uiButton/

I got all the files from your git project, got widget_ios, got the graphics from the tabBar sample, created a 4th tab since you had that in your code and named it accordingly.

I had missed the the "bullet image" but I made one myself with the same size.

So now I ran the sample again and got this error;

1
2
3
4
5
6
7
8
9
10
11
Runtime error
        ...60j8ncjymmw0btj00000gn/T/TemporaryItems/191/main.lua:81: ERROR: ads.init() requires a listener as the last argument.
stack traceback:
        [C]: ?
        [C]: in function 'init'
        ...60j8ncjymmw0btj00000gn/T/TemporaryItems/191/main.lua:81: in main chunk
Runtime error: ...60j8ncjymmw0btj00000gn/T/TemporaryItems/191/main.lua:81: ERROR: ads.init() requires a listener as the last argument.
stack traceback:
        [C]: ?
        [C]: in function 'init'
        ...60j8ncjymmw0btj00000gn/T/TemporaryItems/191/main.lua:81: in main chunk

That's odd. ads.init() doesn't require a call back.

What version are you building with? The ads API isn't available on iOS (Mac) until build 556 and for Android until 591 (Windows/Mac). If you're not running those builds or later, I would set the appID to nil (and for good measure set _G.appID to nil as well, so the code doesn't try to call the ad module.

@rob
Great work!
One thing you might wanna be aware of is when the client write a title that is long it wont fit in the headlineBox. A way to fix that would be to have the text size adjust to headlineBox bounds if that is possible or have the headline appear above the content inside of the textbox.

@holmes
I tried with the ansca blog and it worked for me.

I thought I was chopping the headline at around 22 characters since they are frequently too long in the tableView. On the individual pages, I gave it two lines worth of space and with the being a native.textBox it has scrolling.

This is a first pass. I'm probably going to end up redoing it in Objective C so I can background the audio for the podcasts since I doubt that's on Ansca's roadmap anytime soon.

I just tested this on my website and it didn't work, really odd because I tried several other random sites and they worked fine. I got a feedburner too but didn't fix the error, I think it can't parse the feed.

I'm a little too tired to fix the error now, I will deal with it tomorrow.

before I forget, since it's a little delay when the feeds are downloading maybe the feeds could be downloaded right on app start.

Regarding the errors above with ad.init(). Seems that the latest nightly build DID add a third parameter, a callback function, though it is supposed to be optional. If your running build 605 then that could be the cause. I'm building with 600.

As for it working with some feeds and others, all I can really do is suggest dropping in some print statements, making sure the console is open in case its crashing and the error might help narrow down were to look.

For instance, I had to hack xml.lua to deal with some special characters and I may not have trapped them all. Or there could be other things in the RSS feed that causes that feed to not validate properly and could break the XML parser.

Check your feed in an XML or Feed validator such as:

http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.w3schools.com%2Fxml%2Fxml_validator.asp

and see if it sees problems with the feed.

as for the delay I'm using the built in test to see if the network is available. This is an async event handler, so I have to trigger the test, wait for it to come back saying "Yes you have a network", if not it picks up a cached version. Then it calls network.download, which is again async so I have to wait on that call back too.

You can speed this up by getting rid of the network test. Also if its your wordpress blog, make sure that you have some caching software so it doesn't have to build the feed on the fly. I would hope most of the caching plugins would cash feeds too.

I ran the feed in the validator and it came out perfect, no problems at all. So I thought at first I must have made some changes by accident, now here's the odd part....I put in the Ansca blog again just to check and it came out as this;

1
2
3
4
5
6
7
8
9
10
Runtime error
        ...560j8ncjymmw0btj00000gn/T/TemporaryItems/191/xml.lua:94: XmlParser: trying to close div with p
stack traceback:
        [C]: ?
        [C]: in function 'error'
        ...560j8ncjymmw0btj00000gn/T/TemporaryItems/191/xml.lua:94: in function 'ParseXmlText'
        ...560j8ncjymmw0btj00000gn/T/TemporaryItems/191/xml.lua:119: in function 'loadFile'
        ...560j8ncjymmw0btj00000gn/T/TemporaryItems/191/rss.lua:23: in function 'feed'
        ...8ncjymmw0btj00000gn/T/TemporaryItems/191/screen1.lua:59: in function 'processRSSFeed'
        ...8ncjymmw0btj00000gn/T/TemporaryItems/191/screen1.lua:98: in function <...8ncjymmw0btj00000gn/T/TemporaryItems/191/screen1.lua:92>

This is one of those "This isn't my problem" problems.

The error says that the XML is trying to close a

with a tag.

It appears that the xml.lua file is actually trying to parse the HTML tags that are embedded in the tag (the stuff enclosed in the tags. In that embedded HTML, there is apparently a

being opened without a closing tag.... or at least xml.lua believes that to be the case (I cut the HTML out and put it into firebug and didn't find problems with the HTML).

I have no clue where all that HTML is coming from. I would think RSS feeds would be more about the content than tons of markup.

I think we need to either get Jonathan BeeBee or Alexander Makeev's take on the problem. XML should ignore anything in tags and shouldn't try and parse it. Since I didn't write the xml.lua file, I only hacked it to deal with HTML entities, its going to take me a while to decipher it.

EDIT:
Digging a bit further, the HTML embedded in the tags fails to validate. On line 47:

1
<em>Chickens Quest</em> has a total approximately 8000 lines of code! </span>&nbsp;</p>

Great find, I would never had gone looking in the rss feed since they all validated fine. But I just ran the Ansca blog and it was fine except for this;

"interoperability with the widest range of feed readers"

This doesn't sound good, to me that sounds like;

"yes you can walk but you need legs first"

When I burned my feed I checked for compatibility in the options in feedburner so mine should be fine but maybe I have that error too in my rss feed even though it validates.

After further investigations, I seem to have the same issue with my feed.
I think it's odd, I burn the feed, I validate and it is fine but still the xml is bad. Shouldn't feedburner or the Validator pick up such fault??

I'm not an XML genie myself so I can't go much further in this. There was xml built in to the old coronaUI and it seemed a lot easier to use.

I hope Ansca put their magic hands on XML and build it in to Corona making it a one line wonder thingy like with the rest of Corona.

Feedburner and other validators should ignore the content inside of tags. It's character data that is supposed to be passed verbatim to the reader.

The problem is the xml.lua file, which Jonathan got from a LuaXML site is trying to parse the HTML inside the CDATA tags when it shouldn't be.

So while the XML in the RSS is perfectly valid in this case, the HTML inside the CDATA tags is not.

I tried to look at xml.lua to see if I could easily skip parsing the data, but I haven't figured out what all is going on in that loop yet. I'm not very comfortable with Lua's string formatting codes and the original author didn't use the best variable names (Yea, we are all guilty of that too). So until I can find time to tear that apart (and to be honest, writing an XML parser for the community isn't high on my priority list) it would be helpful for some other eye-balls to take a peek and see if they can make sense of what's going on in that loop and find a way to just capture the CDATA blocks.

I removed some of my articles in my blog and now it worked, I narrowed it down to some code I had on the blog so I think it had some characters not supported in the xml parser.

...it was a snippet of Obj-C, allergic reaction from lua perhaps????

Now it works.

If there are specific characters, I can add (you can too) that in. Look around line 40 in xml.lua for all the:

if h == "8217" then return "'" end

type lines. If h has a string you want to get rid of, just put in some additional tests there.

But if you're getting closing tags not matching up with opening ones, that's going to take much more thought.

I've made some progress on the CDATA issue. I'm ripping it out at the moment, so its no longer trying to parse the content. Now I have to figure out how to get it back in the table in the right place.

When this happens, apps like mine probably won't be able to use native.textBox to show the content since its likely to be filled with HTML and will have to write the block out to a temp file then load it in with a native.webPopUp.

Jonathan BeeBee fixed xml.lua to handle CDATA and we owe him mucho thanks!

I've updated my project with a few new things that might interest you.

xml.lua -- the new CDATA friendly version.
rss.lua -- needed a minor tweek to the handling of the content:encoded tag since I now get the whole entry instead of having to try and parse the paragraphs out of it.
webpage.lua -- a version of page.lua that uses native.webPopUp() to try and render the HTML content your content:encoded is likely to now contain. Still a bit buggy.

and . . . if your an ATOM fan instead of RSS a new... wait for it.. wait for it...

atom.lua -- processes atom based feeds.

and to show you how to use it, screen1.lua now does atom instead of RSS.

Have a little fun with it!

views:1965 update:2011/10/11 15:24:38
corona forums © 2003-2011