Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions docs/guides/typescript_project.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ title: TypeScript Projects
description: Stricter, safer, and better development experience
---

import ApiLink from '@site/src/components/ApiLink';

Crawlee is built with TypeScript, which means it provides the type definition directly in the package. This allows writing code with auto-completion for TypeScript and JavaScript code alike. Besides that, projects written in TypeScript can take advantage of compile-time type-checking and avoid many coding mistakes, while providing documentation for functions, parameters and return values. It will also help with refactoring a lot, and ensuring the least amount of bugs will sneak through.

## Setting up a TypeScript project
Expand Down Expand Up @@ -152,3 +154,36 @@ Let's wrap it up to. In addition to the scripts we described above, we also need
}
}
```

## Type-safe router labels and `userData`

When you structure a crawler with a <ApiLink to="core/class/Router">`Router`</ApiLink>, each route handler reads `request.userData`. By default `userData` is loosely typed, so a typo in a label or a wrong `userData` property is only caught at runtime.
Comment thread
B4nan marked this conversation as resolved.
Outdated

You can instead declare a **route map** — an `interface` (or `type`) that maps each label to the shape of `userData` expected for it — and pass it as the second type argument to a `createXRouter` factory (or <ApiLink to="core/class/Router#create">`Router.create`</ApiLink>). Handlers then get `request.userData` typed per label, and unknown labels become a compile-time error:

```ts
import { createCheerioRouter, type CheerioCrawlingContext } from 'crawlee';

interface Routes {
PRODUCT: { sku: string; price: number };
CATEGORY: { categoryId: string };
}

const router = createCheerioRouter<CheerioCrawlingContext, Routes>();

router.addHandler('PRODUCT', async ({ request }) => {
request.userData.sku; // string
request.userData.price; // number
});

router.addHandler('CATEGORY', async ({ request }) => {
request.userData.categoryId; // string
});

// ❌ compile error: 'TYPO' is not a label declared in `Routes`
router.addHandler('TYPO', async () => {});
```

The default handler registered via `addDefaultHandler` is a fallback for any request (including labels not in the map), so its `request.userData` stays loosely typed (`Record<string, unknown>` by default) — pass an explicit type argument to narrow it.

This is a compile-time-only feature with **no runtime cost**, and it is fully backwards compatible — omitting the route map keeps the original loose typing, and passing a plain `userData` shape (e.g. `createCheerioRouter<CheerioCrawlingContext, { token: string }>()`) still types every handler with that shape, exactly as before.
9 changes: 7 additions & 2 deletions packages/basic-crawler/src/internals/basic-crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2149,9 +2149,14 @@ interface HandlePropertyNameChangeData<New, Old> {
* await crawler.run();
* ```
*/
export function createBasicRouter<
Context extends BasicCrawlingContext = BasicCrawlingContext,
Routes extends Record<keyof Routes, Dictionary> = Record<string, GetUserDataFromRequest<Context['request']>>,
>(routes?: RouterRoutes<Context, Routes>): RouterHandler<Context, Routes>;
export function createBasicRouter<
Context extends BasicCrawlingContext = BasicCrawlingContext,
UserData extends Dictionary = GetUserDataFromRequest<Context['request']>,
>(routes?: RouterRoutes<Context, UserData>) {
return Router.create<Context>(routes);
>(routes?: RouterRoutes<Context, Record<string, UserData>>): RouterHandler<Context, Record<string, UserData>>;
export function createBasicRouter(routes?: RouterRoutes<any, any>) {
return Router.create<any, any>(routes);
}
10 changes: 8 additions & 2 deletions packages/cheerio-crawler/src/internals/cheerio-crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import type {
InternalHttpHook,
RequestHandler,
RequestProvider,
RouterHandler,
RouterRoutes,
SkippedRequestCallback,
} from '@crawlee/http';
Expand Down Expand Up @@ -329,9 +330,14 @@ export async function cheerioCrawlerEnqueueLinks(
* await crawler.run();
* ```
*/
export function createCheerioRouter<
Context extends CheerioCrawlingContext = CheerioCrawlingContext,
Routes extends Record<keyof Routes, Dictionary> = Record<string, GetUserDataFromRequest<Context['request']>>,
>(routes?: RouterRoutes<Context, Routes>): RouterHandler<Context, Routes>;
export function createCheerioRouter<
Context extends CheerioCrawlingContext = CheerioCrawlingContext,
UserData extends Dictionary = GetUserDataFromRequest<Context['request']>,
>(routes?: RouterRoutes<Context, UserData>) {
return Router.create<Context>(routes);
>(routes?: RouterRoutes<Context, Record<string, UserData>>): RouterHandler<Context, Record<string, UserData>>;
export function createCheerioRouter(routes?: RouterRoutes<any, any>) {
return Router.create<any, any>(routes);
}
102 changes: 87 additions & 15 deletions packages/core/src/router.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,34 @@ import type { Awaitable } from './typedefs';

const defaultRoute = Symbol('default-route');

export interface RouterHandler<Context extends Omit<RestrictedCrawlingContext, 'enqueueLinks'> = CrawlingContext>
extends Router<Context> {
/**
* The crawling context received by a route handler, with `request.userData` narrowed to `UserData`.
*/
export type RouterHandlerContext<Context, UserData extends Dictionary> = Omit<Context, 'request'> & {
request: LoadedRequest<Request<UserData>>;
};

/**
* The set of labels accepted by {@apilink Router.addHandler}. When the router declares a concrete
* route map (e.g. `{ PRODUCT: ...; CATEGORY: ... }`), only those labels (plus symbols) are
* allowed — unknown labels become a compile-time error. When the map is left open (the default
* `Record<string, ...>`), any string or symbol label is accepted, preserving the original behaviour.
*/
export type RouterLabel<Routes extends Record<keyof Routes, Dictionary>> = string extends keyof Routes
? string | symbol
: (keyof Routes & string) | symbol;

export interface RouterHandler<
Context extends Omit<RestrictedCrawlingContext, 'enqueueLinks'> = CrawlingContext,
Routes extends Record<keyof Routes, Dictionary> = Record<string, GetUserDataFromRequest<Context['request']>>,
> extends Router<Context, Routes> {
(ctx: Context): Awaitable<void>;
}

export type GetUserDataFromRequest<T> = T extends Request<infer Y> ? Y : never;

export type RouterRoutes<Context, UserData extends Dictionary> = {
[label in string | symbol]: (ctx: Omit<Context, 'request'> & { request: Request<UserData> }) => Awaitable<void>;
export type RouterRoutes<Context, Routes extends Record<keyof Routes, Dictionary>> = {
[Label in keyof Routes]: (ctx: Omit<Context, 'request'> & { request: Request<Routes[Label]> }) => Awaitable<void>;
};

/**
Expand Down Expand Up @@ -82,8 +101,33 @@ export type RouterRoutes<Context, UserData extends Dictionary> = {
* ctx.log.info('...');
* });
* ```
*
* To get `request.userData` typed per label, declare a route map and pass it as the second
* type argument. The label passed to {@apilink Router.addHandler} then drives the type of
* `request.userData`, and unknown labels are rejected at compile time:
*
* ```ts
* import { createCheerioRouter, CheerioCrawlingContext } from 'crawlee';
*
* interface Routes {
* PRODUCT: { sku: string; price: number };
* CATEGORY: { categoryId: string };
* }
*
* const router = createCheerioRouter<CheerioCrawlingContext, Routes>();
*
* router.addHandler('PRODUCT', async ({ request }) => {
* request.userData.sku; // string
* request.userData.price; // number
* });
*
* router.addHandler('TYPO', async () => {}); // compile error: not a known label
* ```
*/
export class Router<Context extends Omit<RestrictedCrawlingContext, 'enqueueLinks'>> {
export class Router<
Context extends Omit<RestrictedCrawlingContext, 'enqueueLinks'>,
Routes extends Record<keyof Routes, Dictionary> = Record<string, GetUserDataFromRequest<Context['request']>>,
> {
private readonly routes: Map<string | symbol, (ctx: any) => Awaitable<void>> = new Map();
private readonly middlewares: ((ctx: Context) => Awaitable<void>)[] = [];

Expand All @@ -94,21 +138,36 @@ export class Router<Context extends Omit<RestrictedCrawlingContext, 'enqueueLink
protected constructor() {}

/**
* Registers new route handler for given label.
* Registers new route handler for given label. When the router declares a route map, the
* `label` is restricted to the declared labels and `request.userData` is typed accordingly.
*/
addHandler<Label extends keyof Routes & string>(
label: Label,
handler: (ctx: RouterHandlerContext<Context, Routes[Label]>) => Awaitable<void>,
): void;

/**
* Registers new route handler for given label, explicitly typing `request.userData` via the
* `UserData` type argument. Useful when the router has no declared route map (the open default)
* and you want to type a single handler, or to register a handler under a `symbol` label.
*/
addHandler<UserData extends Dictionary = GetUserDataFromRequest<Context['request']>>(
label: string | symbol,
handler: (ctx: Omit<Context, 'request'> & { request: LoadedRequest<Request<UserData>> }) => Awaitable<void>,
) {
label: RouterLabel<Routes>,
handler: (ctx: RouterHandlerContext<Context, UserData>) => Awaitable<void>,
): void;

addHandler(label: string | symbol, handler: (ctx: any) => Awaitable<void>): void {
this.validate(label);
this.routes.set(label, handler);
}

/**
* Registers default route handler.
* Registers default route handler. As a fallback it can receive any request (including labels not
* declared in the route map), so `request.userData` defaults to the context's `userData` type
* (loosely typed by default). Pass an explicit `UserData` type argument to narrow it.
*/
addDefaultHandler<UserData extends Dictionary = GetUserDataFromRequest<Context['request']>>(
handler: (ctx: Omit<Context, 'request'> & { request: LoadedRequest<Request<UserData>> }) => Awaitable<void>,
handler: (ctx: RouterHandlerContext<Context, UserData>) => Awaitable<void>,
) {
this.validate(defaultRoute);
this.routes.set(defaultRoute, handler);
Expand Down Expand Up @@ -174,11 +233,24 @@ export class Router<Context extends Omit<RestrictedCrawlingContext, 'enqueueLink
* await crawler.run();
* ```
*/
// Two overloads keep the second type argument backwards compatible. When it is a route map (every
// value is a `Dictionary`) the first overload applies and labels are typed per route. Otherwise it
// fails the `Record<keyof Routes, Dictionary>` constraint and falls through to the second overload,
// where it is treated as the legacy flat `userData` shape shared by all handlers.
static create<
Context extends Omit<RestrictedCrawlingContext, 'enqueueLinks'> = CrawlingContext,
Routes extends Record<keyof Routes, Dictionary> = Record<string, GetUserDataFromRequest<Context['request']>>,
>(routes?: RouterRoutes<Context, Routes>): RouterHandler<Context, Routes>;

static create<
Context extends Omit<RestrictedCrawlingContext, 'enqueueLinks'> = CrawlingContext,
UserData extends Dictionary = GetUserDataFromRequest<Context['request']>,
>(routes?: RouterRoutes<Context, UserData>): RouterHandler<Context> {
const router = new Router<Context>();
>(routes?: RouterRoutes<Context, Record<string, UserData>>): RouterHandler<Context, Record<string, UserData>>;

static create<Context extends Omit<RestrictedCrawlingContext, 'enqueueLinks'> = CrawlingContext>(
routes?: RouterRoutes<Context, any>,
): RouterHandler<Context, any> {
const router = new Router<Context, any>();
const obj = Object.create(Function.prototype);

obj.addHandler = router.addHandler.bind(router);
Expand All @@ -187,7 +259,7 @@ export class Router<Context extends Omit<RestrictedCrawlingContext, 'enqueueLink
obj.use = router.use.bind(router);

for (const [label, handler] of Object.entries(routes ?? {})) {
router.addHandler(label, handler);
router.addHandler(label, handler as (ctx: any) => Awaitable<void>);
}

const func = async function (context: Context) {
Expand All @@ -203,6 +275,6 @@ export class Router<Context extends Omit<RestrictedCrawlingContext, 'enqueueLink

Object.setPrototypeOf(func, obj);

return func as unknown as RouterHandler<Context>;
return func as unknown as RouterHandler<Context, any>;
}
}
10 changes: 8 additions & 2 deletions packages/http-crawler/src/internals/file-download.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import type {
InternalHttpCrawlingContext,
InternalHttpHook,
RequestHandler,
RouterHandler,
RouterRoutes,
} from '../index';
import { HttpCrawler, Router } from '../index';
Expand Down Expand Up @@ -329,9 +330,14 @@ export class FileDownload extends HttpCrawler<FileDownloadCrawlingContext> {
* await crawler.run();
* ```
*/
export function createFileRouter<
Context extends FileDownloadCrawlingContext = FileDownloadCrawlingContext,
Routes extends Record<keyof Routes, Dictionary> = Record<string, GetUserDataFromRequest<Context['request']>>,
>(routes?: RouterRoutes<Context, Routes>): RouterHandler<Context, Routes>;
export function createFileRouter<
Context extends FileDownloadCrawlingContext = FileDownloadCrawlingContext,
UserData extends Dictionary = GetUserDataFromRequest<Context['request']>,
>(routes?: RouterRoutes<Context, UserData>) {
return Router.create<Context>(routes);
>(routes?: RouterRoutes<Context, Record<string, UserData>>): RouterHandler<Context, Record<string, UserData>>;
export function createFileRouter(routes?: RouterRoutes<any, any>) {
return Router.create<any, any>(routes);
}
10 changes: 8 additions & 2 deletions packages/http-crawler/src/internals/http-crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import type {
ProxyConfiguration,
Request,
RequestHandler,
RouterHandler,
RouterRoutes,
Session,
} from '@crawlee/basic';
Expand Down Expand Up @@ -1066,9 +1067,14 @@ function parseContentTypeFromResponse(response: unknown): { type: string; charse
* await crawler.run();
* ```
*/
export function createHttpRouter<
Context extends HttpCrawlingContext = HttpCrawlingContext,
Routes extends Record<keyof Routes, Dictionary> = Record<string, GetUserDataFromRequest<Context['request']>>,
>(routes?: RouterRoutes<Context, Routes>): RouterHandler<Context, Routes>;
export function createHttpRouter<
Context extends HttpCrawlingContext = HttpCrawlingContext,
UserData extends Dictionary = GetUserDataFromRequest<Context['request']>,
>(routes?: RouterRoutes<Context, UserData>) {
return Router.create<Context>(routes);
>(routes?: RouterRoutes<Context, Record<string, UserData>>): RouterHandler<Context, Record<string, UserData>>;
export function createHttpRouter(routes?: RouterRoutes<any, any>) {
return Router.create<any, any>(routes);
}
10 changes: 8 additions & 2 deletions packages/jsdom-crawler/src/internals/jsdom-crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import type {
InternalHttpHook,
RequestHandler,
RequestProvider,
RouterHandler,
RouterRoutes,
SkippedRequestCallback,
} from '@crawlee/http';
Expand Down Expand Up @@ -448,9 +449,14 @@ function extractUrlsFromWindow(window: DOMWindow, selector: string, baseUrl: str
* await crawler.run();
* ```
*/
export function createJSDOMRouter<
Context extends JSDOMCrawlingContext = JSDOMCrawlingContext,
Routes extends Record<keyof Routes, Dictionary> = Record<string, GetUserDataFromRequest<Context['request']>>,
>(routes?: RouterRoutes<Context, Routes>): RouterHandler<Context, Routes>;
export function createJSDOMRouter<
Context extends JSDOMCrawlingContext = JSDOMCrawlingContext,
UserData extends Dictionary = GetUserDataFromRequest<Context['request']>,
>(routes?: RouterRoutes<Context, UserData>) {
return Router.create<Context>(routes);
>(routes?: RouterRoutes<Context, Record<string, UserData>>): RouterHandler<Context, Record<string, UserData>>;
export function createJSDOMRouter(routes?: RouterRoutes<any, any>) {
return Router.create<any, any>(routes);
}
10 changes: 8 additions & 2 deletions packages/linkedom-crawler/src/internals/linkedom-crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import type {
InternalHttpHook,
RequestHandler,
RequestProvider,
RouterHandler,
RouterRoutes,
SkippedRequestCallback,
} from '@crawlee/http';
Expand Down Expand Up @@ -333,9 +334,14 @@ function extractUrlsFromWindow(window: Window, selector: string, baseUrl: string
* await crawler.run();
* ```
*/
export function createLinkeDOMRouter<
Context extends LinkeDOMCrawlingContext = LinkeDOMCrawlingContext,
Routes extends Record<keyof Routes, Dictionary> = Record<string, GetUserDataFromRequest<Context['request']>>,
>(routes?: RouterRoutes<Context, Routes>): RouterHandler<Context, Routes>;
export function createLinkeDOMRouter<
Context extends LinkeDOMCrawlingContext = LinkeDOMCrawlingContext,
UserData extends Dictionary = GetUserDataFromRequest<Context['request']>,
>(routes?: RouterRoutes<Context, UserData>) {
return Router.create<Context>(routes);
>(routes?: RouterRoutes<Context, Record<string, UserData>>): RouterHandler<Context, Record<string, UserData>>;
export function createLinkeDOMRouter(routes?: RouterRoutes<any, any>) {
return Router.create<any, any>(routes);
}
Original file line number Diff line number Diff line change
Expand Up @@ -752,9 +752,14 @@ export class AdaptivePlaywrightCrawler extends PlaywrightCrawler {
}
}

export function createAdaptivePlaywrightRouter<
Context extends AdaptivePlaywrightCrawlerContext = AdaptivePlaywrightCrawlerContext,
Routes extends Record<keyof Routes, Dictionary> = Record<string, GetUserDataFromRequest<Context['request']>>,
>(routes?: RouterRoutes<Context, Routes>): RouterHandler<Context, Routes>;
export function createAdaptivePlaywrightRouter<
Context extends AdaptivePlaywrightCrawlerContext = AdaptivePlaywrightCrawlerContext,
UserData extends Dictionary = GetUserDataFromRequest<Context['request']>,
>(routes?: RouterRoutes<Context, UserData>) {
return Router.create<Context>(routes);
>(routes?: RouterRoutes<Context, Record<string, UserData>>): RouterHandler<Context, Record<string, UserData>>;
export function createAdaptivePlaywrightRouter(routes?: RouterRoutes<any, any>) {
return Router.create<any, any>(routes);
}
10 changes: 8 additions & 2 deletions packages/playwright-crawler/src/internals/playwright-crawler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import type {
BrowserRequestHandler,
GetUserDataFromRequest,
LoadedContext,
RouterHandler,
RouterRoutes,
} from '@crawlee/browser';
import { BrowserCrawler, Configuration, Router } from '@crawlee/browser';
Expand Down Expand Up @@ -286,9 +287,14 @@ export class PlaywrightCrawler extends BrowserCrawler<
* await crawler.run();
* ```
*/
export function createPlaywrightRouter<
Context extends PlaywrightCrawlingContext = PlaywrightCrawlingContext,
Routes extends Record<keyof Routes, Dictionary> = Record<string, GetUserDataFromRequest<Context['request']>>,
>(routes?: RouterRoutes<Context, Routes>): RouterHandler<Context, Routes>;
export function createPlaywrightRouter<
Context extends PlaywrightCrawlingContext = PlaywrightCrawlingContext,
UserData extends Dictionary = GetUserDataFromRequest<Context['request']>,
>(routes?: RouterRoutes<Context, UserData>) {
return Router.create<Context>(routes);
>(routes?: RouterRoutes<Context, Record<string, UserData>>): RouterHandler<Context, Record<string, UserData>>;
export function createPlaywrightRouter(routes?: RouterRoutes<any, any>) {
return Router.create<any, any>(routes);
}
Loading