Consideraciones éticas y legales del web scraping

Clase 8 de 15 • Curso de Web Scraping con Python

Resumen

¿Has considerado alguna vez la legalidad y ética detrás del web scraping? Es importante analizar algunas consideraciones clave para realizar esta práctica de manera responsable y ética. Uno de los elementos esenciales en este proceso es consultar un archivo específico llamado robots.txt.

¿Qué es el archivo robots.txt y cómo utilizarlo?

El archivo robots.txt indica qué partes de un sitio web se pueden scrapear o no. Está ubicado generalmente bajo el URL principal del sitio seguido de /robots.txt. Este archivo define estándares éticos que, aunque no son leyes técnicas, es recomendable respetar.

Por ejemplo, la plataforma Platzi tiene restringido el acceso de scrapers a rutas como el inicio, las clases y comentarios, pero permite el acceso específico al área de diplomas. Otro ejemplo es MercadoLibre, que restringe el acceso a la raíz para ciertos bots que entrenan modelos de inteligencia artificial; sin embargo, rutas específicas del flujo de ventas están disponibles.

¿Qué hacer cuando no existe un robots.txt?

Existen casos, como el del sitio Books to Scrape, donde no hay un archivo robots.txt definido. En estas situaciones, se enfatiza que las consideraciones éticas siguen siendo válidas y necesarias para asegurar buenas prácticas.

¿Cómo implementar buenas prácticas en web scraping?

Al realizar web scraping, es esencial no sobrecargar los servidores del sitio. Lo recomendable es imitar el comportamiento humano en términos de tiempo y frecuencia de accesos. Para hacerlo correctamente considera:

Definir una cabecera (user agent) que corresponda al navegador que estamos usando.
Para encontrar tu propio user agent, abre las herramientas de desarrollo del navegador en la pestaña network, actualiza la página y copia el valor del user agent.
Implementar pausas aleatorias entre peticiones para simular interacciones reales, variando entre uno y tres segundos aproximadamente.

Estas prácticas aseguran que el web scraping realizado sea ético y responsable, imitando el comportamiento de navegación de un usuario real y respetando los límites propuestos por los sitios web.

¿Ya revisaste el archivo robots.txt de tu página web favorita? Te invitamos a hacerlo y compartir tus hallazgos en los comentarios.

Felipe Moreno

student•

Si tú también te lo preguntaste, yo también me lo pregunté. Te dejaré el robots.txt de Platzi 🚀, desde mi interpretación:

🌐 Lo que SÍ podrías scrapear:

Configuraciones generales
Detalles específicos de diplomas de los cursos

❌ Lo que NO podrías scrapear:

Materiales de los cursos
Comentarios y contribuciones de los usuarios
Datos de suscripciones
Estadísticas de pago
Secciones privadas o de acceso restringido
- Ejemplo: Páginas relacionadas con la autenticación de usuarios
- Páginas de pago

🔒 Motivo: Estas restricciones se aplican para proteger la privacidad de los usuarios y la información confidencial.

Aldo G Pineda

student•

Buenas prácticas para un scraping ético

Identifícate: Incluye información de contacto en el user-agent
Minimiza impacto: Implementa retrasos entre peticiones
Caching: Almacena datos ya extraídos para evitar solicitudes repetidas
Respeta límites: Atiende a las señales 429 (Too Many Requests)
Sé selectivo: Extrae solo lo que necesitas, no todo el sitio

David Rosas

student•

Con el 1. te refieres a lo que hicimos en la clase??

Neicer Vásquez

student•

Si una página no tiene el archivo robots.txt, técnicamente puedes realizar scraping sin restricciones. Sin embargo, si posteriormente te bloquean el acceso, esto puede deberse a que la página no permite el scraping, independientemente de la ausencia del archivo. Es esencial actuar de manera ética y respetar las políticas del sitio. Siempre es recomendable simular un comportamiento humano, como variar los tiempos de scraping y establecer un user agent adecuado, para evitar ser bloqueado.

Milton Chavez Palomino

student•

Interesante el archivo robots.txt de Falabella Perú. Dan la bienvenida a los robots, además especifican claramente qué URLs están permitidas para hacer scraping y cuáles están restringidas. https://www.falabella.com.pe/robots.txt

Jeinfferson Bernal G

student•

Que clase tan interesante. Supongo que si se realiza un analisis sobre datos obtenidos sin permiso, podrias ser demandado por la empresa desde donde provienen los datos.

Marcos Sar Lo

student•

Realizar un análisis sobre datos obtenidos sin permiso puede tener implicaciones legales. Dependiendo de la legislación del país y las políticas del sitio web, podrías ser demandado por violar derechos de autor, privacidad o términos de servicio. La buena práctica es siempre consultar el archivo robots.txt y asegurarte de que tienes permiso para scrapear y usar los datos, tal como se menciona en el curso sobre scraping ético.

IA Platzi

Cristian Cardenas

student•

Comparto el robots.txt de Linkedin: https://www.linkedin.com/robots.txt

Alexis Max Caceres Soria

student•

Lo de robots.txt nunca lo había visto, me resulta muy interesante, gracias.

Oscar Guaricallo

student•

robots.txt

Most websites provide a file called robots.txt, which is used to tell web

crawlers what they can scrape and what they should not touch. Naturally, it

is up to the developer to respect these recommendations, but I advise you

to always obey the contents of the robots.txt file.

Let’s see one example of such a file:

User-agent: *

Disallow: /covers/

Disallow: /api/

Disallow: /*checkval

Disallow: /*wicket:interface

Disallow: ?print_view=true

Disallow: /*/search

Disallow: /*/product-search

Allow: /*/product-search/discipline

Disallow: /*/product-search/discipline?*facet-subj=

Disallow: /*/product-search/discipline?*facet-pdate=

Disallow: /*/product-search/discipline?*facet-type=category

The preceding code block is from www.apress.com/robots.txt. As

you can see, most content tells what is disallowed. For example, scrapers

shouldn’t scrape www.apress.com/covers/.

Website Screaping with Python - Gabor Laszlo Hajba

Isaac Bryan Ascanoa Roncall

student•

La pagina con etica que tengo es: https://www.wordreference.com/robots.txt

Jhon Beigmar Balderrama Castro

student•

Esteban Monguí Torres

student•

Comparto el robots de la página de la Securities and Exchange Commission: https://www.sec.gov/robots.txt.

Juan Salazar Saenz

student•

Robots.txt de facebook

# Notice: Collection of data on Facebook through automated means is # prohibited unless you have express written permission from Facebook # and may only be conducted for the limited purpose contained in said # permission. # All authorized user-agents listed on this page must comply with Meta’s # Automated Data Collection Terms available at: # https://www.facebook.com/legal/automated_data_collection_terms User-agent: Amazonbot Disallow: /

User-agent: Applebot-Extended Disallow: /

User-agent: ClaudeBot Disallow: /

User-agent: Google-Extended Disallow: /

User-agent: GPTBot Disallow: /

User-agent: PerplexityBot Disallow: /

User-agent: PetalBot Disallow: /

User-agent: uptimerobot Disallow: /

User-agent: viberbot Disallow: /

User-agent: YaK Disallow: /

User-agent: Yandex Disallow: /

User-agent: Yeti Disallow: /

User-agent: Applebot Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: Bingbot Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: Discordbot Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: DuckDuckBot Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: facebookexternalhit Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: Googlebot Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: Googlebot-Image Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /map_tile.php Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /static_map.php Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: Googlebot-Video Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: Googlebot-News Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: Google-InspectionTool Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: LinkedInBot Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: msnbot Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: Pinterestbot Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: Screaming Frog SEO Spider Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: seznambot Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: Slurp Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: teoma Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: TelegramBot Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: Twitterbot Disallow: /*/plugins/* Disallow: /?*next= Disallow: /a/bz? Disallow: /ajax/ Disallow: /album.php Disallow: /checkpoint/ Disallow: /contact_importer/ Disallow: /dialog/ Disallow: /fbml/ajax/dialog/ Disallow: /feeds/ Disallow: /file_download.php Disallow: /job_application/ Disallow: /l.php Disallow: /login.php*&next= Disallow: /login.php/?next= Disallow: /login.php?next= Disallow: /login/*&next= Disallow: /login/?next= Disallow: /login/device-based/regular/login/*&next= Disallow: /login/device-based/regular/login/?next= Disallow: /moments_app/ Disallow: /p.php Disallow: /photos.php Disallow: /plugins/ Disallow: /share.php Disallow: /share/ Disallow: /sharer.php Disallow: /sharer/ Disallow: /tr/ Disallow: /tr? Disallow: /ufi/reaction/profile/browser/ Disallow: /x/oauth/ Allow: /ajax/bootloader-endpoint/ Allow: /ajax/pagelet/generic.php/PagePostsSectionPagelet Allow: /careers/ Allow: /safetycheck/

User-agent: * Disallow: /