0

ページのように、YouTube のコメントからユーザー名をスクレイピングしたい:

http://www.youtube.com/all_comments?v=mIA0W69U2_Y

「fedfields」、「mystik dread」などのすべてのユーザー名/表示名と、対応するリンク (「fedfields」をクリックすると、そのプロファイルにリンクされます) を取得したい自動化 bash スクリプトを使用してそれらをスクラップしたい次の質問があります。

1私の最初のアプローチは、wgetを使用してページをダウンロードし、正規表現を使用してページを処理してそれらの名前を取得する自動スクリプトを作成することですが、この方法では、ページ全体をダウンロードする必要があり、ダウンロードすると各ページは数MBになります多くのページ、多くのスペースを必要とします。より良い方法はありますか?

2 リンクのように多くのページがあり、7 ページありますが、1 ページですべてを取得することは可能ですか?

4

6 に答える 6

2

mashape で ScrapeGoat を使用して、すべてのユーザー名を json オブジェクトとして返します:)

https://www.mashape.com/warting/scrapegoat/

curl --include --request GET 'https://scrapegoat.p.mashape.com/?url=http%3A%2F%2Fwww.youtube.com%2Fall_comments%3Fv%3DmIA0W69U2_Y&selector=.yt-user-name' --header "X-Mashape-Authorization: <MASHAPE API KEY>"

結果:

{"message":"ok","payload":["whitehouse","Osambasucks2","Osambasucks2","Osambasucks2","omar barazanji","omar barazanji","omar barazanji","omar barazanji","omar barazanji","omar barazanji","HigherPlanes","HigherPlanes","HigherPlanes","RamonaFromPomona","RamonaFromPomona","Osambasucks2","Osambasucks2","Osambasucks2","RamonaFromPomona","terminator360tm","terminator360tm","terminator360tm","terminator360tm","terminator360tm","terminator360tm","Osambasucks2","Osambasucks2","Osambasucks2","Joe Lackey","Joe Lackey","Joe Lackey","ThaGenius101","ThaGenius101","ThaGenius101","Joe Lackey","Ed Patowski","Ed Patowski","Ed Patowski","toughdogyt","toughdogyt","toughdogyt","Osambasucks2","Osambasucks2","Osambasucks2","goodkarmaband","goodkarmaband","Martynas Valiukas","Martynas Valiukas","Martynas Valiukas","goodkarmaband","goodkarmaband","goodkarmaband","Martynas Valiukas","XRedstone688X","XRedstone688X","XRedstone688X","goodkarmaband","Trevor Jones","Trevor Jones","Trevor Jones","goodkarmaband","V V","V V","V V","V V","V V","V V","V V","V V","V V","V V","V V","V V","leeman6417","leeman6417","leeman6417","Osambasucks2","Osambasucks2","Osambasucks2","leeman6417","sosocrazy1234","sosocrazy1234","sosocrazy1234","leeman6417","liamdudeeee","liamdudeeee","liamdudeeee","sosocrazy1234","sosocrazy1234","sosocrazy1234","sosocrazy1234","leeman6417","Ed Patowski","Ed Patowski","Ed Patowski","mastershakelock","mastershakelock","mastershakelock","VGQgex","VGQgex","VGQgex","Osambasucks2","Osambasucks2","Osambasucks2","VGQgex","MindzEnt","MindzEnt","MindzEnt","William willie","William willie","William willie","William willie","William willie","William willie","bkdmd","bkdmd","bkdmd","Osambasucks2","Osambasucks2","Osambasucks2","bkdmd","Rafael Vargas","Rafael Vargas","Rafael Vargas","7even2wenty1","7even2wenty1","7even2wenty1","cashlessbread","cashlessbread","cashlessbread","base3798","base3798","base3798","Ed Patowski","Ed Patowski","Ed Patowski","base3798","john smith","john smith","john smith","Ed Patowski","Neftali Acosta","Neftali Acosta","Neftali Acosta","Ed Patowski","Ed Patowski","Ed Patowski","Neftali Acosta","john smith","john smith","john smith","Neftali Acosta","Canal YooCheckTheFloow","Canal YooCheckTheFloow","Canal YooCheckTheFloow","Abandonbeast","Abandonbeast","Abandonbeast","Canal YooCheckTheFloow","Ironcitytony72","Ironcitytony72","Ironcitytony72","john smith","john smith","john smith","Ironcitytony72","Andrew Apelt","Andrew Apelt","Andrew Apelt","Ironcitytony72","Osambasucks2","Osambasucks2","Osambasucks2","Andrew Apelt","Andrew Apelt","Andrew Apelt","Andrew Apelt","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Andrew Apelt","incas94","incas94","incas94","Osambasucks2","William willie","William willie","William willie","incas94","Osambasucks2","Osambasucks2","Osambasucks2","incas94","Osambasucks2","Osambasucks2","Osambasucks2","incas94","Osambasucks2","Osambasucks2","Osambasucks2","incas94","Andrew Apelt","Andrew Apelt","Osambasucks2","LawnMowerfromHell","LawnMowerfromHell","LawnMowerfromHell","Ironcitytony72","Osambasucks2","Osambasucks2","Osambasucks2","TheAndr3tzi","TheAndr3tzi","TheAndr3tzi","thumsupformyusername","thumsupformyusername","thumsupformyusername","algett","algett","algett","thumsupformyusername","thumsupformyusername","thumsupformyusername","thumsupformyusername","algett","ferkondenster","ferkondenster","ferkondenster","Christian Heinrich","Christian Heinrich","Christian Heinrich","erieejustice911","erieejustice911","erieejustice911","ferkondenster","ferkondenster","ferkondenster","Seth Farsides","Seth Farsides","Seth Farsides","ferkondenster","ferkondenster","ferkondenster","Seth Farsides","Seth Farsides","Seth Farsides","ferkondenster","Doky9889","Doky9889","Doky9889","ferkondenster","ferkondenster","ferkondenster","ferkondenster","Doky9889","sealrk19","sealrk19","sealrk19","wiljam12345","wiljam12345","wiljam12345","Dwayne Cole","Dwayne Cole","Dwayne Cole","Osambasucks2","Osambasucks2","Osambasucks2","Dwayne Cole","Jax Jr","Jax Jr","Jax Jr","Rafael Vargas","Rafael Vargas","Rafael Vargas","William willie","William willie","William willie","William willie","William willie","William willie","Gunnar Rowe","Gunnar Rowe","Gunnar Rowe","Rafael Vargas","Rafael Vargas","Rafael Vargas","Susan Porter","Susan Porter","Susan Porter","derp toth","derp toth","derp toth","MXNR16","nick62301","nick62301","nick62301","Aries2012100","Aries2012100","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","SeventhSun","SeventhSun","SeventhSun","Osambasucks2","Aries2012100","Aries2012100","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","Aries2012100","Aries2012100","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","Rafael Vargas","Rafael Vargas","Rafael Vargas","senormierda","senormierda","senormierda","Rafael Vargas","chrisgilofficial","chrisgilofficial","chrisgilofficial","MXNR16","Osambasucks2","Osambasucks2","Osambasucks2","chrisgilofficial","chrisgilofficial","chrisgilofficial","chrisgilofficial","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","chrisgilofficial","chrisgilofficial","chrisgilofficial","chrisgilofficial","Osambasucks2","Andrew Apelt","Andrew Apelt","chrisgilofficial","Osambasucks2","Osambasucks2","Osambasucks2","chrisgilofficial","aztecadog","aztecadog","aztecadog","chrisgilofficial","Osambasucks2","Osambasucks2","Osambasucks2","aztecadog","Osambasucks2","Osambasucks2","Osambasucks2","aztecadog","Osambasucks2","Osambasucks2","Osambasucks2","chrisgilofficial","Osambasucks2","Osambasucks2","Osambasucks2","chrisgilofficial","ThePhase20","ThePhase20","ThePhase20","ICE778","ICE778","ICE778","Sabrina Blacks","Sabrina Blacks","Sabrina Blacks","Darwin Gutierrez","Darwin Gutierrez","Darwin Gutierrez","lessonsfromryan","tooncrazy1","tooncrazy1","tooncrazy1","unbreackable3000","unbreackable3000","unbreackable3000","Barack Obama","Barack Obama","Barack Obama","Osambasucks2","Osambasucks2","Osambasucks2","Barack Obama","tooncrazy1","tooncrazy1","tooncrazy1","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","tooncrazy1","Osambasucks2","Osambasucks2","Osambasucks2","tooncrazy1","Osambasucks2","Osambasucks2","Osambasucks2","Barack Obama","Americaunderduress","Americaunderduress","Americaunderduress","Barack Obama","Barack Obama","Barack Obama","Osambasucks2","Osambasucks2","Osambasucks2","Barack Obama","FoodStampBarry","FoodStampBarry","FoodStampBarry","Barack Obama","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","rondog ron","myviewsontheworld","myviewsontheworld","myviewsontheworld","SuperNikoYT","SuperNikoYT","SuperNikoYT","myviewsontheworld","Osambasucks2","Osambasucks2","Osambasucks2","myviewsontheworld","Americaunderduress","Americaunderduress","Americaunderduress","myviewsontheworld","Asuma741","Asuma741","Asuma741","RevolutionNewz","damonjo15","damonjo15","damonjo15","Osambasucks2","Osambasucks2","Osambasucks2","damonjo15","Aries2012100","Aries2012100","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","Aries2012100","Aries2012100","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","Aries2012100","Aries2012100","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","Aries2012100","Aries2012100","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","Aries2012100","Aries2012100","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","Aries2012100","Aries2012100","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","Aries2012100","Aries2012100","Aries2012100","Osambasucks2","tooncrazy1","tooncrazy1","tooncrazy1","Aries2012100","KH AK","KH AK","KH AK","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","Osambasucks2","Osambasucks2","Osambasucks2","Aries2012100","kangaroo3259","kangaroo3259","kangaroo3259","Aries2012100","tooncrazy1","tooncrazy1","tooncrazy1","tooncrazy1","tooncrazy1","tooncrazy1","tooncrazy1","youhan younen","youhan younen","youhan younen","tooncrazy1","tooncrazy1","tooncrazy1","youhan younen","Osambasucks2","Osambasucks2","Osambasucks2","youhan younen","Stevejobsultimate2","Stevejobsultimate2","Stevejobsultimate2","Stevejobsultimate2","Stevejobsultimate2","Stevejobsultimate2","Osambasucks2","Osambasucks2","Osambasucks2","Stevejobsultimate2","Rafael Vargas","Rafael Vargas","Rafael Vargas","drewpert0515","drewpert0515","drewpert0515","dv wfwefwe","TheAlienContactee","TheAlienContactee","TheAlienContactee","tooncrazy1","tooncrazy1","tooncrazy1","tooncrazy1","tooncrazy1","tooncrazy1","Jordan Beckwith","Jordan Beckwith","Jordan Beckwith","Michael Carrillo","Michael Carrillo","Michael Carrillo","gotwess","gotwess","gotwess","gotwess","Michael Carrillo","Michael Carrillo","Michael Carrillo","Michael Carrillo","gotwess","Jawad Pullin","Jawad Pullin","Jawad Pullin","TreborHG93","tooncrazy1","tooncrazy1","tooncrazy1","chickeneggchickeneg1","chickeneggchickeneg1","chickeneggchickeneg1","chickeneggchickeneg1","chickeneggchickeneg1","chickeneggchickeneg1","kinggrindhard","kinggrindhard","kinggrindhard","branoaas branoaas","branoaas branoaas","branoaas branoaas","Osambasucks2","Osambasucks2","Osambasucks2","branoaas branoaas","branoaas branoaas","branoaas branoaas","branoaas branoaas","Theindicud","Theindicud","Theindicud","eizieizz","eizieizz","eizieizz","Osambasucks2","Osambasucks2","Osambasucks2","eizieizz","1990Zuck","1990Zuck","1990Zuck","ArcoZakus","ArcoZakus","ArcoZakus","firemedic30ca","johnny grove","johnny grove","johnny grove","joost1v","joost1v","joost1v","Osambasucks2","Osambasucks2","Osambasucks2","joost1v","5sdk1","5sdk1","5sdk1","jeff brennan","jeff brennan","jeff brennan","izizdropshotz","izizdropshotz","izizdropshotz","izizdropshotz","jeff brennan","jeff brennan","jeff brennan","jeff brennan","Bo James","aztecadog","aztecadog","aztecadog","izizdropshotz","izizdropshotz","izizdropshotz","aztecadog","izizdropshotz","izizdropshotz","izizdropshotz","aztecadog","Greg Cimera","Greg Cimera","Greg Cimera","izizdropshotz","izizdropshotz","izizdropshotz","izizdropshotz","Greg Cimera","Greg Cimera","Greg Cimera","Greg Cimera","izizdropshotz","izizdropshotz","izizdropshotz","izizdropshotz","Greg Cimera","Greg Cimera","Greg Cimera","Greg Cimera","izizdropshotz","izizdropshotz","izizdropshotz","izizdropshotz","Greg Cimera","Greg Cimera","Greg Cimera","Greg Cimera","izizdropshotz","izizdropshotz","izizdropshotz","izizdropshotz","Greg Cimera","Greg Cimera","Greg Cimera","Greg Cimera","izizdropshotz","Paul Pascalau","Paul Pascalau","Paul Pascalau","Greg Cimera","tooncrazy1","tooncrazy1","tooncrazy1","Greg Cimera","Greg Cimera","Greg Cimera","Greg Cimera","tooncrazy1","tooncrazy1","tooncrazy1","tooncrazy1","Greg Cimera","Greg Cimera","Greg Cimera","Greg Cimera","tooncrazy1","tooncrazy1","tooncrazy1","tooncrazy1","Greg Cimera","Greg Cimera","Greg Cimera","Greg Cimera","tooncrazy1","tooncrazy1","tooncrazy1","tooncrazy1","Greg Cimera","Osambasucks2","Osambasucks2","Osambasucks2","aztecadog","Osambasucks2","Osambasucks2","Osambasucks2","aztecadog","Osambasucks2","Osambasucks2","Osambasucks2","aztecadog","aztecadog","aztecadog","aztecadog","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","aztecadog","aztecadog","aztecadog","aztecadog","Osambasucks2","izizdropshotz","izizdropshotz","izizdropshotz","aztecadog","Osambasucks2","Osambasucks2","Osambasucks2","aztecadog","Osambasucks2","Osambasucks2","Osambasucks2","aztecadog","Zajac Staszek","Zajac Staszek","Zajac Staszek","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Zajac Staszek","Zajac Staszek","Zajac Staszek","Zajac Staszek","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Zajac Staszek","Zajac Staszek","Zajac Staszek","Zajac Staszek","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Zajac Staszek","Osambasucks2","Osambasucks2","Osambasucks2","Zajac Staszek","Ed Patowski","Ed Patowski","Ed Patowski","Zajac Staszek","aztecadog","aztecadog","aztecadog","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","aztecadog","gotwess","gotwess","gotwess","aztecadog","JeremyTheMoose","JeremyTheMoose","JeremyTheMoose","5sdk1","5sdk1","5sdk1","fordbronco1991","fordbronco1991","fordbronco1991","andy kerver","andy kerver","andy kerver","Omarimage","Omarimage","Omarimage","Omarimage","Omarimage","Omarimage","justin lionti","justin lionti","justin lionti","Omarimage","Butheadbros2","Butheadbros2","Butheadbros2","Omarimage","moonbeamrider1","moonbeamrider1","moonbeamrider1","justin lionti","justin lionti","justin lionti","moonbeamrider1","moonbeamrider1","moonbeamrider1","moonbeamrider1","justin lionti","fordbronco1991","fordbronco1991","fordbronco1991","pellenyberg","pellenyberg","pellenyberg","Son Goku","Son Goku","Son Goku","5ilv3rbvll","5ilv3rbvll","5ilv3rbvll","Butheadbros2","Butheadbros2","Butheadbros2","5ilv3rbvll","5ilv3rbvll","5ilv3rbvll","5ilv3rbvll","Butheadbros2","Butheadbros2","Butheadbros2","5ilv3rbvll","5ilv3rbvll","5ilv3rbvll","5ilv3rbvll","Butheadbros2","Butheadbros2","Butheadbros2","5ilv3rbvll","5ilv3rbvll","5ilv3rbvll","5ilv3rbvll","Butheadbros2","Butheadbros2","Butheadbros2","Butheadbros2","5ilv3rbvll","5ilv3rbvll","5ilv3rbvll","5ilv3rbvll","Butheadbros2","Butheadbros2","Butheadbros2","Butheadbros2","5ilv3rbvll","fisch kopf","fisch kopf","fisch kopf","andrew baker","andrew baker","andrew baker","FVCKDA POPO","FVCKDA POPO","FVCKDA POPO","MrChessmans","MrChessmans","MrChessmans","BryndisiDali","Brazzer man","Brazzer man","Brazzer man","Jack Thompson","ecw141685","ecw141685","ecw141685","Osambasucks2","Osambasucks2","Osambasucks2","ecw141685","lps24evelyn","lps24evelyn","lps24evelyn","erieejustice911","erieejustice911","erieejustice911","erieejustice911","erieejustice911","erieejustice911","Keepskatin","Keepskatin","Keepskatin","erieejustice911","V V","V V","V V","Keepskatin","Abrahan Peraza","Abrahan Peraza","Abrahan Peraza","lexyloveful","Zratedguns","Zratedguns","Zratedguns","MadNoys1","MadNoys1","MadNoys1","MadNoys1","Zratedguns","MadNoys1","MadNoys1","MadNoys1","MadNoys1","MadNoys1","MadNoys1","MadNoys1","MadNoys1","MadNoys1","Joseph Pal","Joseph Pal","Joseph Pal","Joseph Pal","MadNoys1","MadNoys1","MadNoys1","MadNoys1","bear cat","laurynas stirbys","laurynas stirbys","laurynas stirbys","newjerusalem newtestament","newjerusalem newtestament","newjerusalem newtestament","amerilstones","amerilstones","amerilstones","newjerusalem newtestament","Keepskatin","Keepskatin","Keepskatin","newjerusalem newtestament","amerilstones","amerilstones","amerilstones","Keepskatin","Noah Neo","Noah Neo","Noah Neo","charmander4533","charmander4533","charmander4533","Noah Neo","Noah Neo","Noah Neo","Noah Neo","charmander4533","Noah Neo","Noah Neo","Noah Neo","charmander4533","Osambasucks2","Osambasucks2","Osambasucks2","Noah Neo","George Washington","George Washington","George Washington","charmander4533","izizdropshotz","izizdropshotz","izizdropshotz","charmander4533","Wavanova","Wavanova","Wavanova","charmander4533","wisestfoolalive","wisestfoolalive","wisestfoolalive","Noah Neo","Noah Neo","Noah Neo","Noah Neo","wisestfoolalive","colin dooley","colin dooley","colin dooley","colin dooley","colin dooley","colin dooley","Silme037","Silme037","Silme037","colin dooley","Keepskatin","Keepskatin","Keepskatin","colin dooley","princelord55","princelord55","princelord55","Osambasucks2","Osambasucks2","Osambasucks2","princelord55","DriadonRapShow","DriadonRapShow","DriadonRapShow","eddrum100","eddrum100","eddrum100","Ryan S","Osambasucks2","Osambasucks2","Osambasucks2","eddrum100","eddrum100","eddrum100","eddrum100","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","eddrum100","Ryan S","Ryan S","Ryan S","eddrum100","eddrum100","eddrum100","Ryan S","Ryan S","Ryan S","Ryan S","Ryan S","eddrum100","eddrum100","eddrum100","eddrum100","RatedMForModz","RatedMForModz","RatedMForModz","alban97","alban97","alban97","RatedMForModz","Alex Bannon","Alex Bannon","Alex Bannon","alban97","alban97","alban97","alban97","Alex Bannon","james aaron","james aaron","james aaron","RatedMForModz","Ryan S","Ryan S","Ryan S","Dylan N","killllshot","killllshot","killllshot","Saadia Khan","Saadia Khan","talithatf17","talithatf17","talithatf17","amerilstones","amerilstones","amerilstones","talithatf17","BENGHAZIneverForget","BENGHAZIneverForget","BENGHAZIneverForget","talithatf17","talithatf17","talithatf17","supergrover6868","supergrover6868","supergrover6868","talithatf17","Alexander Sigsworth","Alexander Sigsworth","Alexander Sigsworth","supergrover6868","Zratedguns","Zratedguns","Zratedguns","supergrover6868","Keepskatin","Keepskatin","Keepskatin","Zratedguns","Butheadbros2","Butheadbros2","Butheadbros2","Zratedguns","Omegeist","Omegeist","Omegeist","supergrover6868","2Dmensions","2Dmensions","2Dmensions","talithatf17","talithatf17","talithatf17","supergrover6868","supergrover6868","supergrover6868","talithatf17","newjerusalem newtestament","newjerusalem newtestament","newjerusalem newtestament","supergrover6868","VGQgex","VGQgex","VGQgex","talithatf17","talithatf17","talithatf17","talithatf17","Mandragara","Mandragara","Mandragara","talithatf17","deathzbo","deathzbo","deathzbo","Mandragara","Mandragara","Mandragara","deathzbo","Mandragara","Mandragara","Mandragara","deathzbo","deathzbo","deathzbo","deathzbo","Mandragara","eddrum100","eddrum100","eddrum100","Mandragara","Mandragara","Mandragara","Mandragara","eddrum100","Unit01232","Unit01232","Unit01232","supergrover6868","supergrover6868","supergrover6868","Unit01232","Osambasucks2","Osambasucks2","Osambasucks2","supergrover6868","eddrum100","eddrum100","eddrum100","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","eddrum100","Osambasucks2","Osambasucks2","Osambasucks2","Unit01232","Unit01232","Unit01232","Unit01232","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","Unit01232","eddrum100","eddrum100","eddrum100","senormierda","Osambasucks2","Osambasucks2","Osambasucks2","eddrum100","eddrum100","eddrum100","eddrum100","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","eddrum100","Kevin Koala","Kevin Koala","Kevin Koala","senormierda","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","eddrum100","eddrum100","eddrum100","bobothecreepyclown","eddrum100","eddrum100","bobothecreepyclown","eddrum100","eddrum100","eddrum100","bobothecreepyclown","eddrum100","eddrum100","eddrum100","GGRSC","GGRSC","GGRSC","GGRSC","eddrum100","michael smith","michael smith","michael smith","GGRSC","GGRSC","GGRSC","truthinvideos","supergrover6868","supergrover6868","supergrover6868","GGRSC","supergrover6868","supergrover6868","supergrover6868","eddrum100","eddrum100","eddrum100","supergrover6868","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","eddrum100","eddrum100","eddrum100","eddrum100","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","eddrum100","eddrum100","eddrum100","eddrum100","bobothecreepyclown","eddrum100","eddrum100","eddrum100","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","eddrum100","eddrum100","eddrum100","eddrum100","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","eddrum100","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","eddrum100","eddrum100","eddrum100","bobothecreepyclown","eddrum100","eddrum100","eddrum100","bobothecreepyclown","eddrum100","eddrum100","eddrum100","bobothecreepyclown","eddrum100","eddrum100","eddrum100","eddrum100","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","supergrover6868","eddrum100","eddrum100","eddrum100","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","eddrum100","eddrum100","eddrum100","eddrum100","bobothecreepyclown","eddrum100","eddrum100","eddrum100","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","eddrum100","eddrum100","eddrum100","eddrum100","bobothecreepyclown","eddrum100","eddrum100","eddrum100","bobothecreepyclown","supergrover6868","supergrover6868","supergrover6868","bobothecreepyclown","eddrum100","eddrum100","eddrum100","supergrover6868","supergrover6868","supergrover6868","supergrover6868","eddrum100","eddrum100","eddrum100","eddrum100","supergrover6868","bobothecreepyclown","bobothecreepyclown","bobothecreepyclown","willypdyer","willypdyer","willypdyer","Osambasucks2","Osambasucks2","Osambasucks2","willypdyer","spairtain","spairtain","spairtain","DigitalAcceptance","DigitalAcceptance","DigitalAcceptance","ElRancholo2","Osambasucks2","Osambasucks2","Osambasucks2","DigitalAcceptance","ElRancholo2","ElRancholo2","ElRancholo2","DigitalAcceptance","Osambasucks2","Osambasucks2","Osambasucks2","ElRancholo2","Mark Tse","Mark Tse","Mark Tse","DigitalAcceptance","Mark Tse","Mark Tse","Mark Tse","Mark Tse","The Best","The Best","The Best","supergrover6868","supergrover6868","supergrover6868","creativeengineer","creativeengineer","creativeengineer","eddrum100","Ed Patowski","Ed Patowski","Ed Patowski","creativeengineer","Ed Patowski","Ed Patowski","Ed Patowski","creativeengineer","creativeengineer","creativeengineer","creativeengineer","Ed Patowski","Ed Patowski","Ed Patowski","Ed Patowski","creativeengineer","eddrum100","eddrum100","eddrum100","creativeengineer","creativeengineer","creativeengineer","creativeengineer","eddrum100","Osambasucks2","Osambasucks2","Osambasucks2","creativeengineer","supergrover6868","supergrover6868","supergrover6868","creativeengineer","creativeengineer","creativeengineer","creativeengineer","supergrover6868","supergrover6868","supergrover6868","creativeengineer","comicozy87","comicozy87","comicozy87","Raven Gomez","turbidhat","turbidhat","turbidhat","Daracon1010","Daracon1010","Daracon1010","Daracon1010","turbidhat","turbidhat","turbidhat","Daracon1010","VGQgex","VGQgex","VGQgex","Daracon1010","Daracon1010","Daracon1010","Daracon1010","VGQgex","WeThePeopleNoNWO","WeThePeopleNoNWO","WeThePeopleNoNWO","amerilstones","zmanthecool","zmanthecool","zmanthecool","metal220","supergrover6868","supergrover6868","supergrover6868","1974wolfman","1974wolfman","1974wolfman","William willie","William willie","William willie","1974wolfman","1974wolfman","1974wolfman","1974wolfman","William willie","Barskor1","Barskor1","Barskor1","Barskor1","Barskor1","Barskor1","Barskor1","Barskor1","Barskor1","Barskor1","Barskor1","Barskor1","Kanwar Judge","Kanwar Judge","Kanwar Judge","getsumdonginurmouth","getsumdonginurmouth","getsumdonginurmouth","abu bakr","abu bakr","abu bakr","Obamalies100","Obamalies100","Obamalies100","eddrum100","Obamalies100","Obamalies100","Obamalies100","eddrum100","eddrum100","eddrum100","eddrum100","Obamalies100","Obamalies100","Obamalies100","Obamalies100","eddrum100","eddrum100","eddrum100","eddrum100","Obamalies100","Obamalies100","Obamalies100","Obamalies100","eddrum100","Obamalies100","Obamalies100","Obamalies100","eddrum100","eddrum100","eddrum100","eddrum100","Obamalies100","Obamalies100","Obamalies100","Obamalies100","eddrum100","Obamalies100","Obamalies100","Obamalies100","eddrum100","eddrum100","eddrum100","eddrum100","Obamalies100","Obamalies100","Obamalies100","Obamalies100","eddrum100","Obamalies100","Obamalies100","Obamalies100","eddrum100","Obamalies100","Obamalies100","Obamalies100","eddrum100","Obamalies100","Obamalies100","Obamalies100","eddrum100","Obamalies100","Obamalies100","Obamalies100","getsumdonginurmouth","getsumdonginurmouth","getsumdonginurmouth","getsumdonginurmouth","Obamalies100","amerilstones","amerilstones","amerilstones","getsumdonginurmouth","getsumdonginurmouth","getsumdonginurmouth","getsumdonginurmouth","amerilstones","amerilstones","amerilstones","amerilstones","getsumdonginurmouth","Obamalies100","Obamalies100","Obamalies100","getsumdonginurmouth","Obamalies100","Obamalies100","Obamalies100","getsumdonginurmouth","Obamalies100","Obamalies100","Obamalies100","eddrum100","ThaYayo","ThaYayo","ThaYayo","William willie","chrisn365","chrisn365","chrisn365","Eli Jackson","Eli Jackson","Eli Jackson","Jboulos12","Frank Adams","Frank Adams","Frank Adams","amerilstones","amerilstones","amerilstones","eddrum100","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","eddrum100","eddrum100","eddrum100","amerilstones","amerilstones","amerilstones","amerilstones","eddrum100","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","supergrover6868","supergrover6868","supergrover6868","amerilstones","amerilstones","amerilstones","amerilstones","supergrover6868","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","amerilstones","amerilstones","amerilstones","amerilstones","Osambasucks2","LiamborninDC","LiamborninDC","LiamborninDC","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","LiamborninDC","Osambasucks2","Osambasucks2","Osambasucks2","William willie","Osambasucks2","Osambasucks2","Osambasucks2","killllshot","killllshot","killllshot","killllshot","Osambasucks2","Osambasucks2","Osambasucks2","Osambasucks2","killllshot","killllshot","killllshot","killllshot","Osambasucks2","supergrover6868","supergrover6868","supergrover6868","killllshot","Osambasucks2","Osambasucks2","Osambasucks2","killllshot"],"status":200}
于 2013-04-24T15:08:03.920 に答える
2

C# アプリケーションで HtmlAgilityPack を使用できます。

        HtmlAgilityPack.HtmlWeb web = new HtmlWeb();
        HtmlAgilityPack.HtmlDocument doc = web.Load(Url);
        IEnumerable<HtmlNode> userNames = doc.DocumentNode.Descendants("a").Where(
            d => d.Attributes.Contains("class") &&   
            d.Attributes["class"].Value.Contains("yt-user-name"));

RegEx を使用した html の解析に関する有用な情報

YouTube コンテンツにネイティブの gzip 圧縮があるかどうかはわかりませんが、WebRequest クラスで確認できます。はいの場合、トラフィックが大幅に削減されます。

webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.Method = WebRequestMethods.Http.Get;
webRequest.KeepAlive = true;
webRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
webRequest.Headers.Add("Accept-Encoding", "gzip,deflate");
HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse(); 
MessageBox.Show(webResponse.ContentEncoding.ToString());

そして、HTMLAgilityPack を使用してストリームを読み取り、ユーザー名を取得できます。

于 2012-12-20T11:23:51.740 に答える
0

C# もこの方法で役立ちます (ただし、HAP と WebRequest の方が優れています)。

     SHDocVw.InternetExplorer ie = new
            SHDocVw.InternetExplorerClass();
            WebBrowser wb = (WebBrowser)ie;
            wb.Visible = true;
            //Do anything else with the window here that you wish
            wb.Navigate("https://adwords.google.co.uk/um/Logout", ref o, ref o, ref o, ref o);
            while (wb.Busy) { Thread.Sleep(100); }
            HTMLDocument document = ((HTMLDocument)wb.Document);
            IHTMLElement element = document.getElementById("Email");
            HTMLInputElementClass email = (HTMLInputElementClass)element;
            email.value = "testtestingtton@gmail.com";
            email = null;
            element = document.getElementById("Passwd");
            HTMLInputElementClass pass = (HTMLInputElementClass)element;
            pass.value = "pass";
            pass = null;
            element = document.getElementById("signIn");
            HTMLInputElementClass subm = (HTMLInputElementClass)element;
            subm.click();
            subm = null;
于 2014-06-10T09:56:16.627 に答える
0

抽出したい名前フィールドとその他のフィールドの rssfeeds を書き込みます 自動プラグインを使用してクローラーをセットアップします 以下の手順に従います複数の Web サイトからデータを抽出する方法

于 2014-10-27T10:36:51.683 に答える
0

ruby と gem の nokogiri と open-uriを使用した簡単なソリューションを次に示します。

require 'nokogiri'
require 'open-uri'
url="https://www.youtube.com/all_comments?v=mIA0W69U2_Y"
dom=Nokogiri::HTML(open(url))
dom.xpath("//div[@class='comment-entry']").each do |comment|
  username=comment.xpath(".//a[contains(@class,'user-name')]").first
  username=username.content.chomp.strip if username
  profilelink=comment.xpath(".//a[contains(@class,'user-name')]/@href").first
  profilelink=profilelink.content.chomp.strip if profilelink
  profilelink="http://www.youtube.com"+profilelink if profilelink.match(/^\//)
  puts "#{username} #{profilelink}" if username and profilelink
end

詳細については、複数の Web サイトから簡単にデータを抽出する方法をご覧ください。

于 2014-11-06T10:43:00.333 に答える
0

これを行う:

import re
import sys
import time
import urllib2

html = True

argv_list = sys.argv
if len(argv_list) == 2:
    vid = argv_list[1]
else:
    vid = "mIA0W69U2_Y"

regex = re.compile("<span class=\"author.*?<a href=\"(.*?)\".*? dir=\"ltr\">(.*?)</a>", re.DOTALL | re.UNICODE | re.IGNORECASE)

index = 1
author_lists = []
t1 = time.time()
print "######################### Start #########################"

while 1:
    url = "http://www.youtube.com/watch_ajax?action_get_comments=1&v="+vid+"&commenttype=everything&source=w&page_size=500&p="+str(index)+"&format=XML"
    print "Retrieving page "+str(index)+": ", url
    o = urllib2.urlopen(url)
    r = o.read()
    elements = regex.findall(r)
    author_list = []
    for x, y in elements:

        if x.startswith("http://") or x.startswith("https://"):
            continue
        xx = "".join(["http://www.youtube.com", x])
        href = xx.strip()
        #print href


        if "</span>" not in y :
            uname = y.strip()
        else:
            uname = y.split("</span>")[0].strip()

        if uname.startswith("<a"):
            continue

        if not uname or not href:
            continue

        if html:
            #1 output html
            author = "".join(["<a href=\"", href, "\">", uname, "</a>"])
        else:
            #2 output txt
            author = " ".join([uname, href])

        author_list.append(author)

    t = "%02d:%02d:%02d" % reduce(lambda ll,b : divmod(ll[0],b) + ll[1:], [(time.time()-t1,),60,60])
    print "".join(["Time passed: ", t])
    if not author_list:
        break
    else:
        author_lists.extend(author_list)
    index+=1
    #break #uncomment it if you only want to test one page

print "######################### Finished #########################"
print "Total comments: ", len(author_lists)
if author_lists:
    author_lists.sort()
    last = author_lists[-1]
    for i in range(len(author_lists)-2, -1, -1):
        if last == author_lists[i]:
            del author_lists[i]
        else:
            last = author_lists[i]
    if html:
        authors = "<br>".join(author_lists)
        authors = "".join(["<html><meta http-equiv='Content-Type' content='text/html; charset=utf-8'><body>", authors, "</body></html>"])
        fname = vid+".html"
    else:
        authors = "\n".join(author_lists)
        fname = vid+".txt"

    #print "Authors: ", authors
    print "Total commenters: ", len(author_lists)



    oo = open(fname, "w")
    oo.write(authors)
    oo.close()
print "######################### Exist #########################"

txt 出力の例:

ここに画像の説明を入力

HTML 出力の例:

ここに画像の説明を入力

于 2013-04-25T10:56:16.920 に答える