No tienes acceso a esta clase

隆Contin煤a aprendiendo! 脷nete y comienza a potenciar tu carrera

Aplicando lo aprendido

16/21
Recursos

Aportes 279

Preguntas 9

Ordenar por:

驴Quieres ver m谩s aportes, preguntas y respuestas de la comunidad?

o inicia sesi贸n.

Una opci贸n que descubr铆 mientras hac铆a debuging de la expresi贸n Xpath para el Stock Availability, fue copiar la ruta Xpath directamente desde el c贸digo HTML en Google Chrome, de la siguiente manera:

  1. Con la flecha del inspector de elementos, selecciona el punto al que te gustar铆a hacerle scrapping. Ello te llevar谩 al punto en el c贸digo HTML para que investigues su ruta.
  2. Al hacer click derecho sobre esa porci贸n de c贸digo HTML, selecciona Copy y luego Copy Xpath
    .

    .
  3. Luego copialo en la c贸nsola y s贸lo tendr铆as que adaptarle el formato para que lo obtengas como deseas.

IMPORTANTE: Al usar este m茅todo puedes perder legibilidad del c贸digo, lo que podr铆a dificultar la colaboraci贸n de otro compa帽ero y/o dificultar que puedas mantener el c贸digo tiempo despu茅s. Por lo que lo ver铆a m谩s como un complemento que como una soluci贸n definitiva para Xpath.

No existe una 煤nica forma de resolver el reto, y de hecho puede reducirse muchisimo el tama帽o de la expresi贸n si identificamos lo indispensable, tal como en el caso del stock en el siguiente ejemplo:

//Description
$x('//article[@class="product_page"]/p/text()').map(x=>x.wholeText)

//Stock available
$x('//article[@class="product_page"]/div[@class="row"]/div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()[2]').map(x=>x.wholeText)
$x('//p[@class="instock availability"]/text()[2]').map(x=>x.wholeText)```

El stock :

$x('//table[@class="table table-striped"]/tbody/tr[6]/td/text()').map(x => x.wholeText)

La descripci贸n

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText.trim())

:

RESUMEN :

鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻犫枲鈻

  • Realizar expresiones Xpath para:
  • Descripci贸n:
$x('//article/p/text()').map(x => x.wholeText)
["It's hard to imagine a world without A Light in th鈥or you. Shel, you never sounded so good. ...more"]
  • Stock
$x('//article[@class="product_page"]//p[@class="instock availability"]/text()').map(x=>x.wholeText)

  • Precio
$x('//article[@class="product_pod"]/div[@class="product_price"]/p[@class="price_color"]/text()').map(x=>x.wholeText)
#Devuelve: (20)聽["拢51.77", "拢53.74", "拢50.10", "拢47.82", "拢54.23", "拢22.65", "拢33.34", "拢17.93", "拢22.60", "拢52.15", "拢13.99", "拢20.66", "拢17.46", "拢52.29", "拢35.02", "拢57.25", "拢23.88", "拢37.59", "拢51.33", "拢45.17"]
  • Barra lateral izquierda categor铆as :
$x('//div[@class="side_categories"]//a[@href]/text()').map(x=>x.wholeText)

// ejemplos de https://books.toscrape.com/index.html

$x('//article[@class="product_pod"]/h3/a/@title') // trayendo todos los titulos de los libros

$x('//article[@class="product_pod"]/h3/a/@title').map(x => x.value)

$x('//article[@class="product_pod"]/div[@class="product_price"]/p[@class="price_color"]/text()') // tdoos los precios de los libros

$x('//article[@class="product_pod"]/div[@class="product_price"]/p[@class="price_color"]/text()').map(x => x.wholeText)

$x('//div[@class="side_categories"]/ul[@class="nav nav-list"]/li/ul/li/a/text()') // todas las categorias de libros

$x('//div[@class="side_categories"]/ul[@class="nav nav-list"]/li/ul/li/a/text()').map(x => x.wholeText.trim())

// RETO

/*
Extraer descripcion de libros

https://books.toscrape.com/catalogue/sapiens-a-brief-history-of-humankind_996/index.html
*/
$x('//article[@class="product_page"]/p[position()=1]/text()')

$x('//article[@class="product_page"]/p[position()=1]/text()')[0].wholeText


/*
Extraer stock de libros
*/

$x('//table[@class="table table-striped"]/tbody/tr/td/.')[5]

$x('//table[@class="table table-striped"]/tbody/tr[6]/td/text()')[0].wholeText

Para obtener las categor铆as de libros limpias podemos usar una expresi贸n regular.

$x('//div[@class="side_categories"]/ul[@class="nav nav-list"]/li/ul/li/a/text()').map(i => i.data.replace(/[_\W]+/g, ""))

Descripcion

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

Stock

$x('//article[@class="product_page"]/div[@class="row"]/div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()').map(x => x.wholeText.trim())

Estos fueron mis resultados les comparto por si alguien quiere comprar o si talvez lo hicieron de una manera distinta.

Stock Available:

$x('//article[@class = "product_page"]/div[@class = "row"]/div[@class = "col-sm-6 product_main"]/p[@class = "instock availability"]/text()[last()]').map(x => x.wholeText.trim())

Product Description

$x('//article[@class = "product_page"]/p/text()').map(x => x.wholeText)

Como me salio a mi

Para retornar el stock
$x('//p[@class="instock availability"]/text()').map(x => x.wholeText)

Para traer la descripcion

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)```

En el 煤ltimo ejercicio se puede usar $x(鈥//div[@class=鈥渟ide_categories鈥漖/ul[@class=鈥渘av nav-list鈥漖/li/ul/li/a/text()鈥).map(x => x.wholeText.trim()) para tener los textos sin espacios innecesarios.

Me gustar铆a agregar que en el primer ejemplo cuando el toma el t铆tulo del libro, lo hace con el atributo value, esto lo hace porque lo esta tomando directamente de el atributo title=鈥渢铆tulo鈥.

En el segundo caso, como lo toma directamente del texto que est谩 dentro de p, se usa text()

Algo interesante que acabo de notar realizando el reto es que si un elemento tiene 2 clases, como por ejemplo 鈥渃ol-sm-6 product_main鈥. Es necesario que al extraerlo con Xpath incluyas ambas clases. Con los css selector se puede tomar por solo una de ellas pero en Xpath no.

Challenges XPath 馃暤锔

Extract the description of a book 馃搼

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

鈥淲ICKED above her hipbone, GIRL across her heart Words are like a road map 鈥︹

.

Extract stock from a book 馃搼

$x('//div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()[2]').map(x => x.wholeText)


Result= [鈥橽n \n In stock (20 available)\n \n鈥橾

Yo lo hice de la siguiente forma:

$x('//p[@class="instock availability" and contains(., "stock")] | //div[@id="product_description"]/following-sibling::p').map(x=>x.innerText)

Y obtuve un array con los dos datos:

Array [ " In stock (22 available)", "It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love th It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love that Silverstein. Need proof of his genius? RockabyeRockabye baby, in the treetopDon't you know a treetopIs no safe place to rock?And who put you up there,And your cradle, too?Baby, I think someone down here'sGot it in for you. Shel, you never sounded so good. ...more" ]

Por cierto, en este caso aprovech茅 el atributo de innerText de los nodos del DOM para obtener la informaci贸n en un formato m谩s cercano al que se visualiza en la pantalla.

Otra forma de escribirlo:
A mi me han recomendado que siempre intente construir mis XPath Expressions yendo directo a la etiqueta que quiero extraer ya que si cito a toda la ruta que lleva hasta el nodo que quiero, es muy probable que a futuro mi xpath expression no funcione.
驴Por qu茅?
Suele ocurrir que se hagan adaptaciones en el c贸digo y si nuestra expresion es muy extensa, es mayor la probabilidad de verse da帽ada al corto plazo.

驴Qu茅 opinan ustedes?

Para los que utilizan el 鈥淐opy Xpath鈥 de Chrome y tienen problema con los espacios en blanco de los textos, agregar al final el comando .trim

Para las etiquetas por ejemplo:

$x('/html/body/div/div/div/aside/div[2]/ul/li/ul/li/a/text()').map(x => x.wholeText.trim())
  1. Web a scrapear.
  2. Ver el nodo de inter茅s.
  3. Ver en que atributo est谩.
  4. Saber que debo extraer el nodo padre.
  5. $x(鈥 Expresi贸n[Predicados]/nodos/nodo_extraer/@atributo鈥)

Caso emp铆rico

$x('//div/div[@class="page_inner"]/div[@class="content"]/div[2]/article/div[1]/div[last()]/p[2]/text()').map(x => x.wholeText)

Caso Ayuda con el navegador

$x('//*[@id="content_inner"]/article/div[1]/div[2]/p[2]/text()').map(x => x.wholeText)

el resultado es el mismo

Scraping https://books.toscrape.com/catalogue/set-me-free_988/index.html

$x('//div[@class = "content"]/div[@id = "content_inner"]/article[@class = "product_page"]/p/text()').map(x => x.wholeText)```



$x(鈥//div[@class = 鈥渃ontent鈥漖/div[@id = 鈥渃ontent_inner鈥漖/article[@class = 鈥減roduct_page鈥漖/div[@class = 鈥渞ow鈥漖/div[@class = 鈥渃ol-sm-6 product_main鈥漖/p[@class = 鈥渋nstock availability鈥漖/text()鈥).map(x => x.wholeText)```

una opcion corta para los titulos puede ser esta

//a[@title]/@title

Los Xpath de la descripci贸n y el stock

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

$x('//table[@class="table table-striped"]/tbody/tr[6]/td/text()').map(x => x.wholeText)
$x('//article[@class="product_page"]/p/text()').map(x=>x.wholeText)
["It's hard to imagine a world without A Light in th鈥for you. Shel, you never sounded so good. ...more"]0: "It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love th It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love that Silverstein. Need proof of his genius? RockabyeRockabye baby, in the treetopDon't you know a treetopIs no safe place to rock?And who put you up there,And your cradle, too?Baby, I think someone down here'sGot it in for you. Shel, you never sounded so good. ...more"length: 1__proto__: Array(0)
$x('//table[@class="table table-striped"]/tbody/tr/td[starts-with(.,"In")]/text()').map(x=>x.wholeText)
["In stock (22 available)"]
# Stock
$x('//table[@class="table table-striped"]/tbody/tr[contains (., "Availability")]/td/text()').map(x => x.wholeText)

R/: ["In stock (22 available)"]

# Descripcion
$x('//meta[@name="description"]/@content').map(x => x.value)

R/: ["鈫    It's hard to imagine a world without A Light 鈥r you. Shel, you never sounded so good. ...more鈫"]

Practica:
Para obtener el titulo es con:

$x('//h3/a/text()').map(x=>x.wholeText)```
Para obtener el precio es con:


$x(鈥//p[@class=鈥減rice_color鈥漖/text()鈥).map(x=>x.wholeText)


Para obtener la descripci贸n es con:


$x(鈥//article[@class=鈥減roduct_page鈥漖/p/text()鈥).map(x=>x.wholeText)

Para obtener el Stock es con:


$x(鈥//p[@class=鈥渋nstock availability鈥漖/text()鈥).map(x=>x.wholeText)


$x (鈥//div[@id=鈥渃ontent_inner鈥漖/article[@class=鈥減roduct_page鈥漖/p/text()鈥).map(x => x.wholeText)
["鈥淓rotic and absorbing鈥ritten with starling powe鈥n to each other and their affair begins. 鈥ore鈥漖0: "鈥淓rotic and absorbing鈥ritten with starling power.鈥濃"The New York Times Book Review " Nan King, an oyster girl, is captivated by the music hall phenomenon Kitty Butler, a male impersonator extraordinaire treading the boards in Canterbury. Through a friend at the box office, Nan manages to visit all her shows and finally meet her heroine. Soon after, she becomes Kitty鈥檚 鈥淓rotic and absorbing鈥ritten with starling power.鈥濃"The New York Times Book Review " Nan King, an oyster girl, is captivated by the music hall phenomenon Kitty Butler, a male impersonator extraordinaire treading the boards in Canterbury. Through a friend at the box office, Nan manages to visit all her shows and finally meet her heroine. Soon after, she becomes Kitty鈥檚 dresser and the two head for the bright lights of Leicester Square where they begin a glittering career as music-hall stars in an all-singing and dancing double act. At the same time, behind closed doors, they admit their attraction to each other and their affair begins. 鈥ore"length: 1__proto__: Array(0)
$x (鈥//div[@id=鈥渃ontent_inner鈥漖/article[@class=鈥減roduct_page鈥漖/div[@class=鈥渞ow鈥漖/div[@class=鈥渃ol-sm-6 product_main鈥漖/p[@class=鈥渋nstock availability鈥漖/text()鈥).map(x => x.wholeText)
(2)聽["鈫 ", 鈥溾喌 鈫 In stock (20 available)鈫 鈫碘漖0: "鈫 "1: "鈫 鈫 In stock (20 available)鈫 鈫"length: 2__proto__: Array(0)

<code>

Obtener nombre de los productos

$x('//article/h3/a[@title]/text()').map(x => x.data)

La gran lecci贸n es que es mejor trabajar con dos pantallas !

Para la descripci贸n utilic茅 following-sibling para traer nodos hermanos

$x('//div[@id="product_description"]/following-sibling::p/text()')

Para la descripci贸n del libro:

Para el stock:

Expresi贸n para traer la descripci贸n del libro Tipping the Velvet:

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

Expresi贸n para obtener el stock:

$x('//article[@class="product_page"]/div[@class="row"]/div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()').map(x => x.wholeText)

Tambien podemos quitar los

espacios directamente en javascript, dentro de la funcion de map solo agregen 鈥.trim()鈥 despues de wholeText

map(texto => texto.wholeText.trim())

De esta forma nos quedan los nombres de las categorias limpias

$x('//p[@class="instock availability"]/text()').map(x=>x.wholeText)
(2)聽['\n    ', '\n    \n        In stock (22 available)\n    \n']

Doy mis aportes hoy 23 Febrero de 2023, este codigo funciona:

import requests 
import lxml.html as html 
import os
import datetime


HOME_URL= 'https://www.larepublica.co/'
XPATH_LINK_TO_ARTICLE= '//text-fill[not(@class)]/a/@href'
XPATH_TITLE = '//div[@class="mb-auto"]//span/text()'
XPATH_SUMMARY='//div[@class="lead"]/p/text()'
XPATH_BODY='//div[@class="html-content"]/p/text()'


def parse_notice(link, today):
    try:
        response = requests.get(link)
        if response.status_code == 200:
            notice = response.content.decode('utf-8')
            parsed = html.fromstring(notice)

            try:
                title = parsed.xpath(XPATH_TITLE)[0]
                title = title.replace('\"','')
                summary = parsed.xpath(XPATH_SUMMARY)[0]
                body = parsed.xpath(XPATH_BODY)
            except IndexError:
                return

            with open(f'{today}/{title}.txt','w',encoding='utf-8') as f:
            #with open('{}/{}.txt'.format(today,title), 'w',encoding='utf-8') as f:
                f.write(title)    
                f.write('\n\n')
                f.write(summary)
                f.write('\n\n')
                for p in body:
                    f.write(p)
                    f.write('\n')
        else:
            raise ValueError(response.status_code)
    except ValueError as ve:
        print(ve)


def parse_home():
    try:
        response = requests.get(HOME_URL)
        if response.status_code == 200:
            home = response.content.decode('utf-8')
            parsed= html.fromstring(home)
            links_to_notices = parsed.xpath(XPATH_LINK_TO_ARTICLE)
            #print(links_to_notices)

            today = datetime.date.today().strftime('%d-%m-%Y')
            if not os.path.isdir(today):
                os.mkdir(today)

                for link in links_to_notices:
                    parse_notice(link, today)


        else:
            raise ValueError(f'ERROR: {response.status_code}')
    except ValueError as ve:
        print(ve)
      
def run():
    parse_home()

if __name__ =='__main__':
    run()

El libro que seleccione fue:
It鈥檚 Only the Himalayas
para extraer la descripci贸n del producto:

$x('//article[@class = "product_page" ]/p/text()').map(x => x.wholeText)

para extraer el stock:

$x('//div[@class = "col-sm-6 product_main"]/p[@class="instock availability"]/text()').map(x => x.wholeText)

Mi soluci贸n fue

$x('//h3/a/text()').map(x => x.wholeText)

La manera en que se puede solucionar el ejercicio puede ser muy variada dado que se puede recorrer las diferentes etiquetas (para mayor comprensi贸n) o ir directo.

Mi soluci贸n fue:

  • Descripci贸n del libro
    $x(鈥//article[@class=鈥減roduct_page鈥漖/p/text()鈥).map(x=>x.wholeText)
  • Cantidades en stock
    $x(鈥//table[@class=鈥渢able table-striped鈥漖/tbody/tr[6]/td/text()鈥).map(x=>x.wholeText)

Adjunto mi aporte al reto planteado:

<Descripcion de pelicula (Escogi la primera 'A Light in the Attic')>

$x('//div/article[@class = "product_page"]/p/text()').map(x => x.wholeText)

<Stock Disponible>

$x('//div/article[@class = "product_page"]/table[@class = "table table-striped"]/tbody/tr/td/text()').map(x => x.wholeText)[5].slice(10, -1)

Descripcion

$x('//article[@class = "product_page"]/p/text()').map(x => x.wholeText)

stock

$x('//div[@class = "col-sm-6 product_main"]/p[@class = "instock availability"]/text()').map(x => x.wholeText)

https://books.toscrape.com/catalogue/sharp-objects_997/index.html

// Descripcion del Libro
$x(鈥//article[@class=鈥減roduct_page鈥漖/p[position()=1]/text()鈥)[0].wholeText

// Stock de Libro
$x(鈥//table[@class=鈥渢able table-striped鈥漖/tbody/tr[6]/td/text()鈥)[0].wholeText

descripci贸n del producto
//div[@class=鈥渃ontent鈥漖/div[2]/article[@class=鈥減roduct_page鈥漖//h2/font/font/text()
//[@id=鈥渃ontent_inner鈥漖/article/p/font/font[1]
en stock:
//
[@id=鈥渃ontent_inner鈥漖/article/table/tbody/tr[6]/td/font/font

Estas son mis soluciones. No busque optimizar sino comprender el mecanismo.

Obtener titulo:

$x(鈥//article[@class= 鈥product_page鈥漖/div/div/h1/text()鈥).map(x => x.wholeText)

O

$x(鈥//div[@class= 鈥col-sm-6 product_main鈥漖/h1/text()鈥).map(x => x.wholeText)

Obtener Precio

$x(鈥//div[@class= 鈥col-sm-6 product_main鈥漖/p[1]/text()鈥).map(x => x.wholeText)

Obtener Stock

$x(鈥//div[@class= 鈥col-sm-6 product_main鈥漖/p[2]/text()鈥).map(x => x.wholeText)

Obtener Descripci贸n

$x(鈥//article[@class= 鈥product_page鈥漖/p/text()鈥).map(x => x.wholeText)

Solucion del reto:

Con este extraemos la sinapsis del libro:

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

Con este extraemos el stock del libro:

$x('//table[@class="table table-striped"]/tbody/tr/td[contains(.,"In stock")]/text()').map(x => x.wholeText)

Ejercicio:

Descripcion:

$x('//article[@class="product_page"]/p/text()')[0]

Stock:

$x('//article[@class="product_page"]/div[@class="row"]//p[@class="instock availability"]/text()')[1].wholeText.trim()

Aqu铆 mi soluci贸n:
$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText) - Descripci贸n

$x('//table[@class="table table-striped"]/tbody/tr[6]/td/text()').map(x => x.wholeText) - Stock

$x(鈥//h1/text()鈥).map(x=>x.wholeText)
[鈥楢 Light in the Attic鈥橾

$x(鈥//p[@class=鈥渋nstock availability鈥漖/text()鈥).map(x=>x.wholeText)
聽[鈥橽n ', 鈥榎n \n In stock (22 available)\n \n鈥橾

$x(鈥//div[@class=鈥渟ide_categories鈥漖//a/text()鈥).map(x=>x.wholeText)

Noto que para hacer buenas extracciones de una pagina en especifica tambi茅n depende mucho de que la estructura HTML tenga buena practica y una buena organizaci贸n de sus etiquetas.

Descripcion

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

Stock
$x(鈥//p[@class=鈥渋nstock availability鈥漖/text()鈥).map(x => x.wholeText)

<code> 

Despu茅s de ver las soluciones de los dem谩s me doy cuenta que podr铆a haberlo hecho mucho mas corto xD
$x(鈥//div[@class=鈥渃ontainer-fluid page鈥漖/div[@class=鈥減age_inner鈥漖/div[@class=鈥渃ontent鈥漖/div[@id=鈥渃ontent_inner鈥漖/article[@class=鈥減roduct_page鈥漖/p/text()鈥).map(x => x.wholeText)

Gracias al aporte del compa帽ero Oscar, me di cuenta que se reduce significativamente el tama帽o de la expresi贸n. Creo que la descripci贸n todos lo resolvimos f谩cil, pero dense cuenta del stock

$x('//*[@id="content_inner"]/article/div[1]/div[2]/p[2]/text()[2]').map(x => x.wholeText.trim())

Descripci贸n del libro

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

Stock disponible

$x('//article[@class="product_page"]//tr[last()-1]/td/text()').map(x => x.wholeText)

Reto

P谩gina: Producto de Amaz贸n

  • Trayendo descripci贸n una p谩gina web, ejemplo: Amaz贸n
$x('//div[@id="dp"]/div[@id="dp-container"]/div[@id="ppd"]/div[4]/div[@id="productOverview_feature_div"]/div[@class="a-section a-spacing-small a-spacing-top-small"]//text()').map(x => x.wholeText) 
  • Trayendo la cantidad del stock
 $x('//div[@id="dp"]/div[@id="dp-container"]/div[@id="ppd"]/div/div[4]/div/div/div/div/form/div/div/div/div/div[3]/div/div[9]/div/div/span/div/div/span/span/span/span/span/text()').map(x => x.wholeText) 

Hola 鉁岎煆, estas son las expresiones que me salieron

Expresi贸n para sacar la descripci贸n

$x('//div[@id="content_inner"]/article[@class="product_page"]/p/text()').map(x => x.wholeText)

Expresi贸n para extraer el n煤mero en Stock

$x('//div[@id="content_inner"]/article[@class="product_page"]/div[@class="row"]/div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()').map(x => x.wholeText)

Las expresiones del Reto me salieron as铆:

// Para la Descripci贸n
$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)
['"Erotic and absorbing...Written with starling poweon to each other and their affair begins. ...more']


// Para el stock disponible
$x('//div[@class="row"]//p[@class="instock availability"]/text()').map(x => x.wholeText)
(2)聽['\n    ', '\n    \n        In stock (20 available)\n    \n']

Aqu铆 les comparto el c贸digo que utilic茅:

#Este es el c贸digo que utilic茅 para sacar la descripci贸n.
$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

#Este es el c贸digo que utilic茅 para sacar el stock
$x('//article[@class="product_page"]/div/div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()').map(x => x.wholeText)

//description
$x(鈥//article[@class=鈥減roduct_page鈥漖/p/text()鈥).map(e=>e.wholeText)
//stock
$x(鈥//div[@class=鈥渃ol-sm-6 product_main鈥漖/p[@class=鈥渋nstock availability鈥漖/text()[2]鈥).map(e=>e.wholeText.trim())

Para el stock:

$x(鈥//article[@class=鈥減roduct_page鈥漖//p[@class=鈥渋nstock availability鈥漖/text()鈥).map(x=>x.wholeText)

Para la descripci贸n del libro:

$x(鈥//article[@class=鈥減roduct_page鈥漖/p/text()鈥).map(x=>x.wholeText)

Stock

$x('//div[@class = "col-sm-6 product_main"]/p[@class = "instock availability"]/text()').map(x => x.wholeText)

Descripci贸n

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

buenos dias, aqui dejando mi aporte al desafio:
descripci贸n:

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

stock:

$x('//div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()[2]').map(x => x.wholeText)

Mi aporte.

Stock:

$x('//article[@class="product_page"]/div[@class="row"]/div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/font//text()[last()]').map(x => x.wholeText.trim())

Descripci贸n:

$x('//*[@id="content_inner"]/article/p//font/text()').map(x => x.wholeText)
// Mostrar la lista de categorias sin los espacios en blanco y sin el caracter "\n" 
$x('//div[@class="side_categories"]/ul[@class="nav nav-list"]/li/ul/li/a').map(x => x.innerText)

Reto:

//Obtener descripcion del producto
$x('//article[@class="product_page"]//div[@class="col-sm-6 product_main"]/h1')[0].innerText

//Obtener el precio del producto
$x('//article[@class="product_page"]//div[@class="col-sm-6 product_main"]/p[@class="price_color"]')[0].innerText

Me divert铆 mucho en esta clase.

Los retos no est谩n muy complicado, tomate tus minutos para resolverlos:

$x(鈥//article[@class=鈥減roduct_page鈥漖/p/text()鈥).map(x =>x.wholeText)

$x(鈥//table[@class=鈥渢able table-striped鈥漖/tbody/tr[6]/td/text()鈥).map(x => x.wholeText)

Aqu茅 comparto mis Xpath
Descripci贸n del libro:
$x(鈥//article[@class=鈥減roduct_page鈥漖/p/text()鈥).map(x => x.wholeText)

Stock disponible:
$x(鈥//article[@class=鈥減roduct_page鈥漖/div[@class=鈥渞ow鈥漖/div[@class=鈥渃ol-sm-6 product_main鈥漖/p[@class=鈥渋nstock availability鈥漖[1]/text()鈥).map(x => x.wholeText)

Reto:
Descripci贸n

$x('//div[@id="content_inner"]/*/p/node()').map(x => x.wholeText)

Stock

$x('//div[@id="content_inner"]/*/table[@class="table table-striped"]/*/tr[6]/td/node()').map(x => x.wholeText)
  • Description
$x('//*[@id="content_inner"]/article/p/text()').map( x => x.wholeText)

$x('//*[@id="content_inner"]/article/p/text()').map( x => x.wholeText)

  • Price
$x('//*[@id="content_inner"]/article/div[1]/div[2]/p[1]/text()').map(x => x.wholeText)


$x('//div[@class="col-sm-6 product_main"]/p[@class="price_color"]/text()').map(x => x.wholeText)

Existe un complemento para Chrome que ayuda a obtener la ruta de cualquier HTML con un solo click!

https://chrome.google.com/webstore/detail/xpath-helper/hgimnogjllphhhkhlmebbmlgjoejdpjl/related

$x('//article[@class = "product_page"]/p/text()').map(x=>x.wholeText)

鈥淚t鈥檚 hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein鈥檚 humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love th It鈥檚 hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein鈥檚 humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love that Silverstein. Need proof of his genius? RockabyeRockabye baby, in the treetopDon鈥檛 you know a treetopIs no safe place to rock?And who put you up there,And your cradle, too?Baby, I think someone down here鈥檚Got it in for you. Shel, you never sounded so good. 鈥ore鈥

$x('//article[@class = "product_page"]/table[@class = "table table-striped"]/tbody/tr/td[contains(.,"In stock")]/text()').map(x=>x.wholeText)

In stock (22 available)

(21) [鈥櫬縌u茅 es el web scraping?鈥, 鈥樎縋or qu茅 aprender web scraping hoy?鈥, 鈥楶ython: el lenguaje m谩s poderoso para extraer datos鈥, 鈥楨ntender HTTP鈥, 鈥樎縌u茅 es HTML?鈥, 鈥楻obots.txt: permisos y consideraciones al hacer web scraping鈥, 鈥榅ML Path Language鈥, 鈥楾ipos de nodos en XPath鈥, 鈥楨xpresiones en XPath鈥, 鈥楶redicados en Xpath鈥, 鈥極peradores en Xpath鈥, 鈥榃ildcards en Xpath鈥, 鈥業n-text search en Xpath鈥, 鈥榅Path Axes鈥, 鈥楻esumen de XPath鈥, 鈥楢plicando lo aprendido鈥, 鈥楿n proyecto para tu portafolio: scraper de noticias鈥, 鈥楥onstrucci贸n de las expresiones de XPath鈥, 鈥極bteniendo los links de los art铆culos con Python鈥, 鈥楪uardando las noticias en archivos de texto鈥, 鈥楥贸mo continuar tu ruta de aprendizaje鈥橾

$x('//div[@class = "Syllabus-class-container-text"]/p/text()').map(x=>x.wholeText)

Para el stock

<$x('//article[@class="product_page"]/table[@class="table table-striped"]/tbody/tr/td[contains(., "stock")]/text()').map(x => x.wholeText)
['In stock (22 available)']> 

Para la descripcion

<$x('//article[@class="product_page"]/p/text()').map(x=> x.wholeText)> 

descripci贸n sin espacios

$x('//article/h3/a/text()').map(x => x.data.replace(/[_\W]+/g,""))

Descripcion del libro A Light in the Attic:
$x(鈥//*[@id=鈥渃ontent_inner鈥漖/article/p/text()鈥).map(x=>x.wholeText)

Stock:
$x(鈥//*[@id=鈥渃ontent_inner鈥漖/article/div[1]/div[2]/p[2]/text()鈥).map(x=>x.wholeText)

  • Descripci贸n del producto:
$x('//article/p/text()').map(x => x.wholeText)
  • Stock disponible:
$x('//article/table[@class="table table-striped"]//tr//td[contains(.,"In")]/text()').map(x => x.wholeText)

Ejemplo resumido para obtener las categor铆as:

$x('//a[@href[contains(.,"/category/books/")]]/text()')

Resumi un ejemplo para obtener los precios:

$x('//*/text()[contains(.,"拢")]').map(x=>x.wholeText)

Descripci贸n:

$x('//div[@class="content"]//article[@class="product_page"]/p/text()').map(x => x.wholeText)
["It's hard to imagine a world without A Light in thfor you. Shel, you never sounded so good. ...more"]

Stock:

$x('//div[@class="content"]//article[@class="product_page"]/div/div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()[2]').map(x => x.wholeText.trim())
['In stock (22 available)']

Hola, comparto una alternativa al c贸digo del profe:

$x('//article[@class="product_pod"]/div/p[@class="price_color"]/node()').map(x => x.wholeText)

Para hallar el stock se puede hacer as铆:

$x(鈥//tbody/tr[last()-1]/td/text()鈥).map(x => x.wholeText)

$x('//article[@class="product_page"]/p/text() | //p[@class="instock availability"]/text() ').map(x=>x.wholeText)
  • Descripci贸n :
$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

  • En Stock :
$x('//article[@class="product_page"]/div[@class="row"]/div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()').map(x => x.wholeText)

El primero fue mas facil

$x('//article[@class="product_page"]/p/text()').map(x=>x.nodeValue)

El segundo me costo mas 馃槙

$x('//article[@class="product_page"]/div[@class="row"]/div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()[last()]').map(x=>x.nodeValue)

Stock:

$x('//p[@class="instock availability"]/text()[2]').map(x => x.wholeText.trim())

Description:

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText())

Para sacar las unidades disponibles:

$x('//div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/node()').map(x => x.wholeText)

Y para sacar la descripci贸n:

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

Desaf铆os

Reto
Disponibilidad:

$x('//article[@class="product_page"]/table[@class="table table-striped"]/tbody/tr[6]/td/text()').map(x => x.wholeText)

Descripcion:

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

Reto:

Descripcion:

$x('/html/body/div/div/div/div/article/p/text()').map(x => x.wholeText)

Disponibilidad:

$x('/html/body/div/div/div[2]/div[2]/article/table/tbody/tr[6]/td/text()').map(x => x.wholeText)

$x(鈥//article[@class=鈥減roduct_page鈥漖/p/text()鈥).map(x=>x.wholeText)

$x(鈥//article[@class=鈥減roduct_page鈥漖/table[@class=鈥渢able table-striped鈥漖/tbody/tr/td[contains(.,鈥渟tock鈥)]/text()鈥).map(x=>x.wholeText)

Aporte del Reto!
1.- Seleccionar un libro

  • extraer su descripcion
$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)
  • extraer el stock
$x('//div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()').map(x => x.wholeText)

Reto:

-. Descripci贸n del producto:

$x("//article[@class='product_page']/p/text()")[0].wholeText

-. Stock:

$x("//article[@class='product_page']//p[@class='instock availability']/text()")[1].wholeText

stock

$x('//div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()').map(k => k.wholeText)
(2)聽['\n    ', '\n    \n        In stock (22 available)\n    \n']

Product Description

$x('//article[@class="product_page"]/p/text()').map(k => k.wholeText)
["It's hard to imagine a world without A Light in th鈥or you. Shel, you never sounded so good. ...more"]

Descripci贸n:
Al tener una ID ser铆a imposible que se corresponda con otro nodo as铆 que considero que esto es bastante 贸ptimo.

//div[@id="product_description"]/p/text()

Stock:
Aqu铆 hay que tener mucho cuidado compa帽eros. Es importante que cuando busquen el p del stock no pongan instock availability porque solo les servir铆a en este caso. La clase instock hace referencia que est谩 en stock, si le cambian a outofstock ver谩n que se colorea de rojo, es decir, en el hipot茅tico caso de que no haya stock, el Xpath estar铆a mal si ponen instock availability.

//div[contains(@class, "product_main")]/p[contains(@class, "availability")]/text()

RETO:

La primera linea de c贸digo es la que yo hice, la segunda es usando este M茅todo de un compa帽ero

Descripci贸n del libro

Stoks

Gracias a este m茅todo me di de cuenta de que no tengo que ser tan redundante. Todo es aprendizaje y de los errores es que se aprende m谩s! 馃巼
馃懢

Mi aporte.

Para Stock

$x('//article[@class="product_page"]//div[@class="col-sm-6 product_main"]/p[@class="instock availability"]/text()').map(x => x.wholeText)

Para Descripci贸n:

$x('//article[@class="product_page"]/p/text()').map(x => x.wholeText)

Lo interesante es sacar los libros y sus precios clasificados por categor铆as

Extraer descripcion de libro

$x('//div[@class="content"]/div[@id="content_inner"]/article[@class="product_page"]/p/text()').map(x => x.wholeText)

Extraer stock disponible

$x('//div[@class="content"]/div[@id="content_inner"]/article[@class="product_page"]/table[@class="table table-striped"]/tbody/tr/td[contains(., "stock")]/text()').map(x => x.wholeText)

Revisando otros aportes, observe que no es necesario ir tan atr谩s en los nodos para este cason. Sin embargo, igual anexo el ejemplo c贸mo ilustraci贸n.

stock:
$x(鈥//table[@class=鈥渢able table-striped鈥漖//tr[6]/td/text()鈥).map(x=>x.wholeText)
descripci贸n:
$x(鈥//div[@class=鈥渃ontent鈥漖/div[@id=鈥渃ontent_inner鈥漖/article[@class=鈥減roduct_page鈥漖/p/text()鈥).map(x=>x.wholeText)

$x(鈥//table[@class=鈥渢able table-striped鈥漖/tbody/tr[6]/td//text()鈥).map(x => x.wholeText)

Para la descripci贸n me la trajo as铆:
$x(鈥//article[@class=鈥減roduct_page鈥漖/p//text()鈥).map(x => x.wholeText)